Agreed. Thomas's solution fixes the backward incompatibility. I think we really need to fix this.
On Sun, Nov 22, 2015 at 10:23 PM, Timothy Farkas <[email protected]> wrote: > Gaurav, > > I think if the state copy fails then STRAM should roll back the operator to > a checkpoint that is further back than the last checkpoint. If you are > saying that you want to preserve the semantic that checkpointed is only > called after a checkpoint is completed, I would argue that that guarantee > is already pointless in the current implementation since it is always > possible for an operator to be rolled back to a checkpoint before it's last > completed checkpoint. So, it is already currently possible for some > database or file operation performed after a completed checkpoint to be > redone after a failure. Because of this I think Thomas's solution makes the > most sense. Thomas's solution would also address Chandni's original point > that the semantics for the checkpointed call back have been violated. There > are operators in our libraries that have depended on the beginWindow(x), > endWindow(x), and checkpointed(x) call sequence, which is now broken. We > should probably fix that. > > Tim > > On Sun, Nov 22, 2015 at 10:02 PM, Gaurav Gupta <[email protected]> > wrote: > > > Thomas, > > > > This was done to preserve checkpointing semantics that is to tell the > > operator that its state is preserved. Say if database is updated or files > > are moved in checkpointed call but the state copy fails, how to address > > such scenarios? > > > > Thanks > > - Gaurav > > > > > On Nov 22, 2015, at 9:44 PM, Thomas Weise <[email protected]> > > wrote: > > > > > > Alternatively I would ask why the checkpointed callback needs to wait > > until > > > the data was copied to HDFS instead upon completion of the state > > > serialization. > > > > > > Thomas > > > > > > > > > On Sun, Nov 22, 2015 at 9:41 PM, Chandni Singh < > [email protected]> > > > wrote: > > > > > >> Gaurav, > > >> > > >> My question is about why Async was made the default when it changed > the > > >> semantics of operator callbacks. Your response doesn't answer that. > > >> > > >> In a way we broke backward compatibility. > > >> > > >> Chandni > > >> > > >> On Sun, Nov 22, 2015 at 9:22 PM, Gaurav Gupta <[email protected] > > > > >> wrote: > > >> > > >>> The idea behind Async checkpointing is to unblock operator while the > > >> state > > >>> is getting transferred to HDFS. > > >>> Just to clarify that this beginWindow (x) -> endWindow(x) -> > > checkpointed > > >>> (x-1 ) should be an ideal sequence, but if the HDFS is slow or for > some > > >>> other reason transferring the state to HDFS is slow this sequence may > > not > > >>> hold true. > > >>> > > >>> Can your use case be addressed by > > >>> https://malhar.atlassian.net/browse/APEX-78 < > > >>> https://malhar.atlassian.net/browse/APEX-78>? > > >>> > > >>> Thanks > > >>> - Gaurav > > >>> > > >>>> On Nov 22, 2015, at 3:56 PM, Chandni Singh <[email protected] > > > > >>> wrote: > > >>>> > > >>>> With Async checkpointing the checkpoint callback in CheckpointPoint > > >>>> listener is called for a previous window, that is, > > >>>> beginWindow (x) -> endWindow(x) -> checkpointed (x-1 ) > > >>>> > > >>>> This feature was newly introduced. With synchronous checkpointing, > the > > >>>> behavior was always > > >>>> beginWindow(x) -> endWindow(x) -> checkpointed (x) > > >>>> > > >>>> A lot of operators were written before asynchronous checkpointing > was > > >>>> introduced and few of them can rely on the sequencing guaranteed by > > >>>> synchronous checkpointing. > > >>>> > > >>>> So why was Async Checkpointed made default? > > >>>> > > >>>> With how Async checkpoint is today, the complexity to handle > transient > > >>>> state in checkpointed callback falls on every operator. For eg, lets > > >> say > > >>>> earlier I had a transient map which I cleared every time the > > >> checkpointed > > >>>> was called, with async checkpointing this simple task will be a lot > > >> more > > >>>> complicated. > > >>>> > > >>>> I think Async checkpointing broke the semantics of operator > callbacks > > >> and > > >>>> should NOT be the default. > > >>> > > >>> > > >> > > > > >
