isunjin removed a comment on issue #6684:     [FLINK-10205] Batch Job: 
InputSplit Fault tolerant for DataSource…
URL: https://github.com/apache/flink/pull/6684#issuecomment-431734239
 
 
   Great discussion, thanks everybody. 
   @wenlong88, the scenario you mention is what i try to fix.  
[here](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 is a concrete example, a simple word count job will have data inconsistent 
while failover, the job should fail but success with zero output.
   
   @tillrohrmann, **_InputSplitAssigner_** generate a list of _**InputSplit**_, 
the order might not matter, but every input should be proceed exactly once, if 
a task fail while process a _**InputSplit**_, this _**InputSplit**_ should be 
processed again, however, in batch scenario, it might not true,  
[this](https://github.com/isunjin/flink/commit/b61b58d963ea11d34e2eb7ec6f4fe4bfed4dca4a)
 repro shows that the current codebase doesn't has this logic and thus it has 
data inconsistent issue.
   
   Its not a problem in Streaming scenario, as the  _**InputSplit**_ will be 
treat as a record, eg: in _**ContinuousFileMonitoringFunction**_, it will 
collect  _**InputSplit**_ and every  _**InputSplit**_ will be guaranteed 
process exactly once by FLINK, @wenlong88 will this work in your scenario? 
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to