FYI, HDHTWriter implementation is dependent on the older semantics and seems to be broken now. startWindow(x) -> endWindow(x) -> checkpointed(x) In the checkpointed implementation, it copies certain state (transient) and transfers it to a checkpointedWriteCache with respect to window 'x'.
With Async checkpointing it, the state that is transferred is much more recent than window 'x'. Chandni On Sun, Nov 22, 2015 at 11:04 PM, Chandni Singh <[email protected]> wrote: > Agreed. Thomas's solution fixes the backward incompatibility. I think we > really need to fix this. > > On Sun, Nov 22, 2015 at 10:23 PM, Timothy Farkas <[email protected]> > wrote: > >> Gaurav, >> >> I think if the state copy fails then STRAM should roll back the operator >> to >> a checkpoint that is further back than the last checkpoint. If you are >> saying that you want to preserve the semantic that checkpointed is only >> called after a checkpoint is completed, I would argue that that guarantee >> is already pointless in the current implementation since it is always >> possible for an operator to be rolled back to a checkpoint before it's >> last >> completed checkpoint. So, it is already currently possible for some >> database or file operation performed after a completed checkpoint to be >> redone after a failure. Because of this I think Thomas's solution makes >> the >> most sense. Thomas's solution would also address Chandni's original point >> that the semantics for the checkpointed call back have been violated. >> There >> are operators in our libraries that have depended on the beginWindow(x), >> endWindow(x), and checkpointed(x) call sequence, which is now broken. We >> should probably fix that. >> >> Tim >> >> On Sun, Nov 22, 2015 at 10:02 PM, Gaurav Gupta <[email protected]> >> wrote: >> >> > Thomas, >> > >> > This was done to preserve checkpointing semantics that is to tell the >> > operator that its state is preserved. Say if database is updated or >> files >> > are moved in checkpointed call but the state copy fails, how to address >> > such scenarios? >> > >> > Thanks >> > - Gaurav >> > >> > > On Nov 22, 2015, at 9:44 PM, Thomas Weise <[email protected]> >> > wrote: >> > > >> > > Alternatively I would ask why the checkpointed callback needs to wait >> > until >> > > the data was copied to HDFS instead upon completion of the state >> > > serialization. >> > > >> > > Thomas >> > > >> > > >> > > On Sun, Nov 22, 2015 at 9:41 PM, Chandni Singh < >> [email protected]> >> > > wrote: >> > > >> > >> Gaurav, >> > >> >> > >> My question is about why Async was made the default when it changed >> the >> > >> semantics of operator callbacks. Your response doesn't answer that. >> > >> >> > >> In a way we broke backward compatibility. >> > >> >> > >> Chandni >> > >> >> > >> On Sun, Nov 22, 2015 at 9:22 PM, Gaurav Gupta < >> [email protected]> >> > >> wrote: >> > >> >> > >>> The idea behind Async checkpointing is to unblock operator while the >> > >> state >> > >>> is getting transferred to HDFS. >> > >>> Just to clarify that this beginWindow (x) -> endWindow(x) -> >> > checkpointed >> > >>> (x-1 ) should be an ideal sequence, but if the HDFS is slow or for >> some >> > >>> other reason transferring the state to HDFS is slow this sequence >> may >> > not >> > >>> hold true. >> > >>> >> > >>> Can your use case be addressed by >> > >>> https://malhar.atlassian.net/browse/APEX-78 < >> > >>> https://malhar.atlassian.net/browse/APEX-78>? >> > >>> >> > >>> Thanks >> > >>> - Gaurav >> > >>> >> > >>>> On Nov 22, 2015, at 3:56 PM, Chandni Singh < >> [email protected]> >> > >>> wrote: >> > >>>> >> > >>>> With Async checkpointing the checkpoint callback in CheckpointPoint >> > >>>> listener is called for a previous window, that is, >> > >>>> beginWindow (x) -> endWindow(x) -> checkpointed (x-1 ) >> > >>>> >> > >>>> This feature was newly introduced. With synchronous checkpointing, >> the >> > >>>> behavior was always >> > >>>> beginWindow(x) -> endWindow(x) -> checkpointed (x) >> > >>>> >> > >>>> A lot of operators were written before asynchronous checkpointing >> was >> > >>>> introduced and few of them can rely on the sequencing guaranteed by >> > >>>> synchronous checkpointing. >> > >>>> >> > >>>> So why was Async Checkpointed made default? >> > >>>> >> > >>>> With how Async checkpoint is today, the complexity to handle >> transient >> > >>>> state in checkpointed callback falls on every operator. For eg, >> lets >> > >> say >> > >>>> earlier I had a transient map which I cleared every time the >> > >> checkpointed >> > >>>> was called, with async checkpointing this simple task will be a lot >> > >> more >> > >>>> complicated. >> > >>>> >> > >>>> I think Async checkpointing broke the semantics of operator >> callbacks >> > >> and >> > >>>> should NOT be the default. >> > >>> >> > >>> >> > >> >> > >> > >> > >
