[GitHub] [flink] curcur edited a comment on pull request #13648: [FLINK-19632] Introduce a new ResultPartitionType for Approximate Local Recovery

GitBox Thu, 22 Oct 2020 09:11:11 -0700


curcur edited a comment on pull request #13648:
URL: https://github.com/apache/flink/pull/13648#issuecomment-714599452



   Thanks, @rkhachatryan for clarifying the problem! 
   I agree there is a problem if a downstream task continuously fails multiple 
times, or an orphan task execution may exist for a short period of time after 
new execution is running (as described in the FLIP).
   
   Here is an idea of how to cleanly and thoroughly solve this kind of problem:
   1. We go with the simplified release view version: only release view before 
a new creation (in thread2). That says we won't clean up view when downstream 
task disconnects (`releaseView` would not be called from the reference copy of 
view) (in thread1 or 2).
      - This would greatly simplify the threading model
      - This won't cause any resource leak, since view release is only to 
notify the upstream result partition to releaseOnConsumption when all 
subpartitions are consumed in PipelinedSubPartitionView. In our case, we do not 
release result partition on consumption any way (the result partition is put in 
track in JobMaster, similar to the ResultParition.blocking Type).
   
   2. Each view is associated with a downstream task execution version
      - This is making sense because we actually have different versions of 
view now, corresponding to the vertex.version of the downstream task.
      - createView is performed only if the new version to create is greater 
than the existing one
      - If we decide to create a new view, the old view's parent (subpartition) 
is set --> invalid
   
   I think this way, we can completely disconnect the old view with the 
subpartition. Besides that, the working handler in use would always hold the 
freshest view reference.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] curcur edited a comment on pull request #13648: [FLINK-19632] Introduce a new ResultPartitionType for Approximate Local Recovery

Reply via email to