[ 
https://issues.apache.org/jira/browse/FLINK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16638685#comment-16638685
 ] 

ASF GitHub Bot commented on FLINK-10205:
----------------------------------------

isunjin commented on issue #6684:     [FLINK-10205] Batch Job: InputSplit Fault 
tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-427127694
 
 
   @StefanRRichter thanks for comments
   - for the inconsistent issue, 
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is the repro, the logic is simple, we throw a exception in the wordcount 
example and use restartRegion as the failover strategy, the job was expected to 
fail, but succeed with incorrect result. the reason is that while restart, it 
will call requestNextSplit, it will return empty as the the split was drained 
to empty, since its empty, flatMap method will not get executed and exception 
will not throw.
   
   - the goal for the general approach is to make sure we have the assumption 
"deterministic behavior" as much as possible, as deterministic is crucial for 
failover. the code is not target for introduce "deterministic" for 
DataSourceTask, right now DataSourceTask is only used for batch scenario . For 
streaming scenario, it will work once we treat the splitIndex as state.
   
   - for the load balance, i think the first priority is make data consistent, 
we can certainly add more logic to make it more efficient.   
   
   - Thanks for let me know this, however, this is a bug right now, actually 
block me moving forward, we can refactor this code if we have a fundamental 
different design. 
   
   
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch Job: InputSplit Fault tolerant for DataSourceTask
> -------------------------------------------------------
>
>                 Key: FLINK-10205
>                 URL: https://issues.apache.org/jira/browse/FLINK-10205
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>    Affects Versions: 1.6.1, 1.7.0, 1.6.2
>            Reporter: JIN SUN
>            Assignee: JIN SUN
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Today DataSource Task pull InputSplits from JobManager to achieve better 
> performance, however, when a DataSourceTask failed and rerun, it will not get 
> the same splits as its previous version. this will introduce inconsistent 
> result or even data corruption.
> Furthermore,  if there are two executions run at the same time (in batch 
> scenario), this two executions should process same splits.
> we need to fix the issue to make the inputs of a DataSourceTask 
> deterministic. The propose is save all splits into ExecutionVertex and 
> DataSourceTask will pull split from there.
>  document:
> [https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to