@isunjin in the issue/design doc you are talking about potential data inconsistency/corruption that this PR is trying to fix. However, I wonder what sort of corruption you have in mind that is fixed here. Can you provide a concrete example of a problematic case? In my understanding, graph components are either connected and need a connected restart or they are independent and can recover fine-grained but then then it should also not matter in which order splits are reprocessed.
Besides that, I wonder if the general approach is a good fit for the current and future architecture of this component. In particular, we pull the concern of `InputSplit` down to the level of `Executions`. `Execution` or `ExecutionJobVertex` are used in batch and streaming and to me it does not seem like a good step to introduce batch-specific code into those classes if we can avoid it. Another thing that I question here is if it would not make sense to think about a way that allows us also to release the assignment from an input split to a certain task, so that another task can pick it up in case that there is a longer lasting problem with the original task. Last, we are currently thinking about a general redesign of the source interface and how input is assigned to the source instances. @aljoscha has a WIP branch to experiment with the possible changes here https://github.com/aljoscha/flink/tree/refactor-source-interface, but we should keep in mind that sources might be split into two operators in the future. [ Full content available at: https://github.com/apache/flink/pull/6684 ] This message was relayed via gitbox.apache.org for [email protected]
