Thanks to Chesnay for bringing up this proposal.
It's good news that we can have a applicable fine grained recovery for
batch jobs soon.
+1 for this proposal.

Regards,
Zhu

Till Rohrmann <trohrm...@apache.org> 于2019年4月15日周一 下午5:57写道:

> Thanks for summarizing the current state of Flip-1 and outlining the way to
> move forward with it Chesnay.
>
> I think we should implement the first version of the backtracking logic
> using the DataConsumptionException (FLINK-6227) to signal if an
> intermediate result partition has been lost.
>
> Moreover, I think it would be best to base the new implementation on the
> refined FailoverStrategy interface proposed by the scheduler refactorings
> [1]. We could have an adaptor to make work with the existing code for
> testing purposes and until the scheduler interfaces have been introduced.
>
> Apart from that, +1 for completing Flink's first improvement proposal :-)
>
> [1]
>
> https://docs.google.com/document/d/1fstkML72YBO1tGD_dmG2rwvd9bklhRVauh4FSsDDwXU/edit?usp=sharing
>
> Cheers,
> Till
>
> On Sun, Apr 14, 2019 at 8:20 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
> > Hello everyone,
> >
> > Till, Zhu Zhu and myself have prepared a Design Document
> > <
> >
> https://docs.google.com/document/d/1YHOpMLdC-dtgjcM-EDn6v-oXgsEQKXSoMjqRcYVbJA8
> >
> >
> > for introducing backtracking for failover regions. This is an
> > optimization of the failure handling logic for jobs with blocking result
> > partitions (which primarily exist in batch jobs), where only part of the
> > job has to be restarted.
> > This has a continuation of the FLIP-1
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
> >
> >
> > efforts to introduce fine-grained recovery from task failures.
> > The associated JIRA can be found here
> > <https://issues.apache.org/jira/browse/FLINK-12068>.
> >
> > Any feedback is highly appreciated.
> >
> > Regards,
> > Chesnay
> >
>

Reply via email to