Thanks to Chesnay for bringing up this proposal. It's good news that we can have a applicable fine grained recovery for batch jobs soon. +1 for this proposal.
Regards, Zhu Till Rohrmann <trohrm...@apache.org> 于2019年4月15日周一 下午5:57写道: > Thanks for summarizing the current state of Flip-1 and outlining the way to > move forward with it Chesnay. > > I think we should implement the first version of the backtracking logic > using the DataConsumptionException (FLINK-6227) to signal if an > intermediate result partition has been lost. > > Moreover, I think it would be best to base the new implementation on the > refined FailoverStrategy interface proposed by the scheduler refactorings > [1]. We could have an adaptor to make work with the existing code for > testing purposes and until the scheduler interfaces have been introduced. > > Apart from that, +1 for completing Flink's first improvement proposal :-) > > [1] > > https://docs.google.com/document/d/1fstkML72YBO1tGD_dmG2rwvd9bklhRVauh4FSsDDwXU/edit?usp=sharing > > Cheers, > Till > > On Sun, Apr 14, 2019 at 8:20 PM Chesnay Schepler <ches...@apache.org> > wrote: > > > Hello everyone, > > > > Till, Zhu Zhu and myself have prepared a Design Document > > < > > > https://docs.google.com/document/d/1YHOpMLdC-dtgjcM-EDn6v-oXgsEQKXSoMjqRcYVbJA8 > > > > > > for introducing backtracking for failover regions. This is an > > optimization of the failure handling logic for jobs with blocking result > > partitions (which primarily exist in batch jobs), where only part of the > > job has to be restarted. > > This has a continuation of the FLIP-1 > > < > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures > > > > > > efforts to introduce fine-grained recovery from task failures. > > The associated JIRA can be found here > > <https://issues.apache.org/jira/browse/FLINK-12068>. > > > > Any feedback is highly appreciated. > > > > Regards, > > Chesnay > > >