Thomas, I did it to maintain the semantics of Checkpoint. If we need to change this then I can change the implementation.
Thanks - Gaurav > On Nov 22, 2015, at 10:11 PM, Thomas Weise <[email protected]> wrote: > > You can only perform such operation in committed. Anything done in > checkpointed can be repeated (until it becomes a recovery checkpoint). > > On Sun, Nov 22, 2015 at 10:02 PM, Gaurav Gupta <[email protected]> > wrote: > >> Thomas, >> >> This was done to preserve checkpointing semantics that is to tell the >> operator that its state is preserved. Say if database is updated or files >> are moved in checkpointed call but the state copy fails, how to address >> such scenarios? >> >> Thanks >> - Gaurav >> >>> On Nov 22, 2015, at 9:44 PM, Thomas Weise <[email protected]> >> wrote: >>> >>> Alternatively I would ask why the checkpointed callback needs to wait >> until >>> the data was copied to HDFS instead upon completion of the state >>> serialization. >>> >>> Thomas >>> >>> >>> On Sun, Nov 22, 2015 at 9:41 PM, Chandni Singh <[email protected]> >>> wrote: >>> >>>> Gaurav, >>>> >>>> My question is about why Async was made the default when it changed the >>>> semantics of operator callbacks. Your response doesn't answer that. >>>> >>>> In a way we broke backward compatibility. >>>> >>>> Chandni >>>> >>>> On Sun, Nov 22, 2015 at 9:22 PM, Gaurav Gupta <[email protected]> >>>> wrote: >>>> >>>>> The idea behind Async checkpointing is to unblock operator while the >>>> state >>>>> is getting transferred to HDFS. >>>>> Just to clarify that this beginWindow (x) -> endWindow(x) -> >> checkpointed >>>>> (x-1 ) should be an ideal sequence, but if the HDFS is slow or for some >>>>> other reason transferring the state to HDFS is slow this sequence may >> not >>>>> hold true. >>>>> >>>>> Can your use case be addressed by >>>>> https://malhar.atlassian.net/browse/APEX-78 < >>>>> https://malhar.atlassian.net/browse/APEX-78>? >>>>> >>>>> Thanks >>>>> - Gaurav >>>>> >>>>>> On Nov 22, 2015, at 3:56 PM, Chandni Singh <[email protected]> >>>>> wrote: >>>>>> >>>>>> With Async checkpointing the checkpoint callback in CheckpointPoint >>>>>> listener is called for a previous window, that is, >>>>>> beginWindow (x) -> endWindow(x) -> checkpointed (x-1 ) >>>>>> >>>>>> This feature was newly introduced. With synchronous checkpointing, the >>>>>> behavior was always >>>>>> beginWindow(x) -> endWindow(x) -> checkpointed (x) >>>>>> >>>>>> A lot of operators were written before asynchronous checkpointing was >>>>>> introduced and few of them can rely on the sequencing guaranteed by >>>>>> synchronous checkpointing. >>>>>> >>>>>> So why was Async Checkpointed made default? >>>>>> >>>>>> With how Async checkpoint is today, the complexity to handle transient >>>>>> state in checkpointed callback falls on every operator. For eg, lets >>>> say >>>>>> earlier I had a transient map which I cleared every time the >>>> checkpointed >>>>>> was called, with async checkpointing this simple task will be a lot >>>> more >>>>>> complicated. >>>>>> >>>>>> I think Async checkpointing broke the semantics of operator callbacks >>>> and >>>>>> should NOT be the default. >>>>> >>>>> >>>> >> >>
