Re: Checkpoint / Two Phase Commit

Piotr Nowojski Sun, 02 Jun 2019 23:40:41 -0700

Hi,

Sorry for late answer.


> As I understand how it operates, the pre-phase state is when the checkpoint
> is initiated and the checkpoint barrier advances from source to sink. Once
> the pre-phase is complete (and successful), then the next step in the
> process is where the "Sink" operator is expected to "Commit" the transaction
> (the data that was part of the checkpointed state). The datasource backed by
> the Sink operator is expected to commit the transaction.

Yes you are correct. However keep in mind, that before we move to “Commit” 
phase, all of the participants of the pre commit must have completed it 
successfully.

> Say if the transaction times out and data cannot be committed, is there an
> option to roll back the checkpointed state without incurring the data loss?
> As of now, it does not work this way but I am trying to understand the
> challenges/limitations with respect to discarding the checkpoint in
> question?


No, it is impossible to rollback the data in case of timeout, because some of 
the other actors (different parallel instances of the sink) could have already 
committed the data. Once everybody acknowledged “pre-commit” AND the job master 
started “commit” phase, there is no going back, all commits must succeed 
eventually (intermittent failures are allowed) or there will be a data loss.

> 
> Once reason could be with respect to (what to do with) the subsequent
> checkpoints that might have advanced during this scenario? Anything else?

Sorry, I’m not sure if understood the question. Depending on an external 
system, but generally if a transaction has timed out’ed, the data is lost, 
regardless of whether there was a more recent batch of records written in some 
more recent transaction, that could have be committed successfully.

Piotrek

> On 22 May 2019, at 02:30, vijikarthi <vijikar...@yahoo.com> wrote:
> 
> Question regarding end-to-end exactly once guarantee implementation using
> 2PC? 
> 
> As I understand how it operates, the pre-phase state is when the checkpoint
> is initiated and the checkpoint barrier advances from source to sink. Once
> the pre-phase is complete (and successful), then the next step in the
> process is where the "Sink" operator is expected to "Commit" the transaction
> (the data that was part of the checkpointed state). The datasource backed by
> the Sink operator is expected to commit the transaction.
> 
> Say if the transaction times out and data cannot be committed, is there an
> option to roll back the checkpointed state without incurring the data loss?
> As of now, it does not work this way but I am trying to understand the
> challenges/limitations with respect to discarding the checkpoint in
> question?
> 
> Once reason could be with respect to (what to do with) the subsequent
> checkpoints that might have advanced during this scenario? Anything else?
> 
> Regards
> Vijay
> 
> 
> 
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: Checkpoint / Two Phase Commit

Reply via email to