@isunjin in the issue/design doc you are talking about potential data 
inconsistency/corruption that this PR is trying to fix. However, I wonder what 
sort of corruption you have in mind that is fixed here. Can you provide a 
concrete example of a problematic case? In my understanding, graph components 
are either connected and need a connected restart or they are independent and 
can recover fine-grained but then then it should also not matter in which order 
splits are reprocessed.

Besides that, I wonder if the general approach is a good fit for the current 
and future architecture of this component. In particular, we pull the concern 
of `InputSplit` down to the level of `Executions`. `Execution` or 
`ExecutionJobVertex` are used in batch and streaming and to me it does not seem 
like a good step to introduce batch-specific code into those classes if we can 
avoid it. Another thing that I question here is if it would not make sense to 
think about a way that allows us also to release the assignment from an input 
split to a certain task, so that another task can pick it up in case that there 
is a longer lasting problem with the original task. Last, we are currently 
thinking about a general redesign of the source interface and how input is 
assigned to the source instances. @aljoscha has a WIP branch to experiment with 
the possible changes here 
https://github.com/aljoscha/flink/tree/refactor-source-interface, but we should 
keep in mind that sources might be split into 
 two operators in the future.  

[ Full content available at: https://github.com/apache/flink/pull/6684 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to