[
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659655#comment-16659655
]
ASF GitHub Bot commented on FLINK-10205:
----------------------------------------
isunjin commented on issue #6684: [FLINK-10205] Batch Job: InputSplit Fault
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431950619
@tillrohrmann
>The thing I'm questioning is whether the InputSplits of the failed task
need to be processed by the same (restarted) task or can be given to any
running task.
Agree.
I think failed task **doesn't** very necessary need to be processed by the
same task (executionvertex).
> So far I'm not convinced that something would break if we simply return
the InputSplits to the InputSplitAssigner
Agree.
i think ```simply return the InputSplits to the InputSplitAssigner``` would
work, the point is how to make it work.
Restart the entier graph will call ExecutionJobVertex.resetForNewExecution
which will create a new ```InputSplitAssigner``` and "return" all
```InputSplits``` to ``` InputSplitAssigner```.
My point is that for fine-grian failover, we might not want to return all
```InputSplits``` but just the failed ```InputSplits```. However, currently
not all subclass of InputSplitAssigner has the logic to ```simply return the
InputSplits to the InputSplitAssigner```, such as
```LocatableInputSplitAssigner``` or any other ```customized
InputSplitAssigner```.
```simply return the InputSplits to the InputSplitAssigner``` also implies
transaction between task and jobManager (maybe multiple one), we need to make
sure the ```inputSplits``` get return to the ```InputSplitAssigner``` exactly
once. what happened if we have speculative execution, which means two task
consume the same set of InputSplits and but not fail at same time, does every
InputSplitAssigner need to keep a list to deduplicate? what happened if the TM
died or has network issue and InputSplit cannot be return?
Save the ```InputSplits``` in executionVertex is a way to "return" it to ```
InputSplitAssigner```, the "side effect" of this implementation is that this
also implies the ``` InputSplits``` will be handled by the same task
(executionVertex). But this seams a simple and safe way to implement ```simply
return the InputSplits to the InputSplitAssigner``` with transaction.
@tillrohrmann, the above is my understanding, let you know if we are on the
same page. I would happy to redo this if you have any other suggestion.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Batch Job: InputSplit Fault tolerant for DataSourceTask
> -------------------------------------------------------
>
> Key: FLINK-10205
> URL: https://issues.apache.org/jira/browse/FLINK-10205
> Project: Flink
> Issue Type: Sub-task
> Components: JobManager
> Affects Versions: 1.6.1, 1.6.2, 1.7.0
> Reporter: JIN SUN
> Assignee: JIN SUN
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.7.0
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better
> performance, however, when a DataSourceTask failed and rerun, it will not get
> the same splits as its previous version. this will introduce inconsistent
> result or even data corruption.
> Furthermore, if there are two executions run at the same time (in batch
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask
> deterministic. The propose is save all splits into ExecutionVertex and
> DataSourceTask will pull split from there.
> document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)