Bhupesh, You can also find details in earlier discussion on similar topic http://mail-archives.apache.org/mod_mbox/incubator-apex-dev/201511.mbox/%3CCAMaZnT6Nn6fqKV+gP-fjEcMGt9y6By1XhUVjDo1Y==Apc_=2...@mail.gmail.com%3E
Thanks - Gaurav > On Nov 20, 2015, at 11:24 AM, Munagala Ramanath <[email protected]> wrote: > > To reinforce Tim's point, here is a quote from > https://docs.datatorrent.com/application_development/#checkpointing > > -------------- > In case of an operator that has an application window size that is > larger than the size of the streaming window, the checkpointing by > default still happens at same intervals as with other operators. To > align checkpointing with application window boundary, the application > developer should set the attribute “CHECKPOINT_WINDOW_COUNT” to > “APPLICATION_WINDOW_COUNT”. This ensures that the checkpoint happens > at the end of the application window and not within that window. Such > operators now treat the application window as an atomic computation > unit. The downside is that it does need the upstream buffer server to > keep tuples for the entire application window. > --------------- > > Ram > > On Fri, Nov 20, 2015 at 11:12 AM, Timothy Farkas <[email protected]> wrote: >> Hi Bhupesh, >> >> 1. Yes the checkpoint will be delayed until the end of the checkpont window >> (not the app window). >> 2. For the case OperatorContext.APPLICATION_WINDOW_COUNT = >> OperatorContext.CHECKPOINT_WINDOW_COUNT = 1 I would expect the checkpoint >> to happen at the end of window 59 not 58. Sounds like something is wrong >> their >> For the case OperatorContext.APPLICATION_WINDOW_COUNT = >> OperatorContext.CHECKPOINT_WINDOW_COUNT > 1 Your observations are correct. >> 3. APPLICATION_WINDOW_COUNT plays no role in checkpointing, when a >> checkpoint happens is determined entirely by CHECKPOINT_WINDOW_COUNT. >> However, In a lot of cases however operators need APPLICATION_WINDOW_COUNT >> to equal CHECKPOINT_WINDOW_COUNT in order to be fault tolerant. >> >> Thanks, >> Tim >> >> On Fri, Nov 20, 2015 at 11:03 AM, Bhupesh Chawda <[email protected]> >> wrote: >> >>> Hi All, >>> >>> I have some confusion regarding the attributes which control the >>> checkpointing in Apex. >>> >>> As per my understanding, there are three attributes which play a role in >>> deciding when to perform checkpointing: >>> >>> 1. OperatorContext.APPLICATION_WINDOW_COUNT (How many streaming windows >>> form an app window) >>> 2. OperatorContext.CHECKPOINT_WINDOW_COUNT (Indicates the optimal >>> checkpoint boundary) >>> 3. DagContext.CHECKPOINT_WINDOW_COUNT (how many streaming windows form a >>> checkpoint window) >>> >>> I have the following doubts: >>> >>> - Let's consider the case where checkpointing always happens at an >>> Application window boundary. This implies >>> OperatorContext.APPLICATION_WINDOW_COUNT = >>> OperatorContext.CHECKPOINT_WINDOW_COUNT. The engine will always insert a >>> "checkpoint marker" tuple after the DagContext.CHECKPOINT_WINDOW_COUNT >>> streaming windows are done. However, if this marker arrives at a time >>> when >>> the application window is still not over, then the checkpoint will be >>> delayed till the end of the application window. Is this understanding >>> correct? >>> >>> >>> - One inconsistency that I have observed: When >>> OperatorContext.APPLICATION_WINDOW_COUNT = >>> OperatorContext.CHECKPOINT_WINDOW_COUNT = 1, the first checkpoint >>> happens >>> after the 58th window instead of 59th window (0 indexed). The following >>> checkpoints however happen at the intended 118th, 178th etc. windows. >>> This >>> is for the default case of 60 windows checkpoint. In other cases, when >>> OperatorContext.APPLICATION_WINDOW_COUNT = >>> OperatorContext.CHECKPOINT_WINDOW_COUNT > 1 (and a perfect divisor of >>> DagContext.CHECKPOINT_WINDOW_COUNT), it happens at the 59th window, >>> 119th >>> window and so on. Is this behaviour expected? >>> >>> >>> - Moving on to the other case of allowing checkpointing within >>> application windows; here OperatorContext.APPLICATION_WINDOW_COUNT may >>> not >>> be equal to OperatorContext.CHECKPOINT_WINDOW_COUNT. Similar to the >>> previous case, the "checkpoint marker" tuple arrives at the end of >>> DagContext.CHECKPOINT_WINDOW_TUPLES. In this case, there is no >>> restriction >>> on when not to checkpoint. So what is the role of >>> OperatorContext.CHECKPOINT_WINDOW_COUNT in this case? >>> >>> >>> Thanks. >>> -Bhupesh >>>
