Checkpointing is an asynchronous operation. The following sub tasks take
place as part of checkpointing:

1 The state of an operator is collected as of the end of the windowId by
Streaming Container
2. The state is dispatched for saving to the persistent media (HDFS by
default)
3. After 2 is done the windowId is reported to Streaming App Master.
4. Streaming App Master updates its books and acks back to the Streaming
Container.
5. Streaming container invokes CheckpointListener interface.

These steps can take unpredictable amount of time and hence immediately
after finishing step 1, the operator moves on to processing next window
while step 2 onwards are executed in different threads and joined in step
5. Hence the windowId provided to CheckpointListener is often behind the
last window Id processed by the operator. The difference is typically 1
but it can be any positive integer.

‹
Chetan


On 11/10/15, 3:36 AM, "Bhupesh Chawda" <[email protected]> wrote:

>Hi Chetan / Community,
>
>Can someone please elaborate on why the window id supplied to
>CheckpointListener and the Operator would differ.
>I tried looking at the window ids of checkpointed() and the beginWindow()
>calls and they differ by 1. Don't know why this should be the case.
>
>Thanks.
>-Bhupesh
>
>On Thu, Sep 17, 2015 at 5:56 AM, Chetan Narsude <[email protected]>
>wrote:
>
>> Short answer is yes.
>>
>> All the control tuples are scheduled to be delivered outside of the
>>window.
>> As checkpointed callback is triggered because of CHECKPOINT control
>>tuple,
>> it will happen after endWindow and before the next beginWindow.
>>
>> The windowId supplied to CheckpointListener and the one provided to
>> Operator need not match even though the sequence is defined. So I am
>> curious how you intend to use this knowledge.
>>
>> --
>> Chetan
>>
>>
>> On Tue, Sep 15, 2015 at 8:31 AM, Thomas Weise <[email protected]>
>> wrote:
>>
>> > It has not changed the operator execution model. State serialization
>>is
>> > still synchronous, write to HDFS is taken out of the operator thread.
>> >
>> > On Tue, Sep 15, 2015 at 8:18 AM, Amol Kekre <[email protected]>
>> wrote:
>> >
>> > >
>> > > Sent too soon. Has asynchronous checkpointing changed this?
>> > >
>> > > Amol
>> > >
>> > > Sent from my iPhone
>> > >
>> > > > On Sep 15, 2015, at 12:38 AM, Bhupesh Chawda <
>> [email protected]>
>> > > wrote:
>> > > >
>> > > > Hi All,
>> > > >
>> > > > Is it safe to assume that the checkpointed() and the beginWindow()
>> > calls
>> > > > are sequenced?
>> > > > In other words, are these calls part of the same thread and may
>>not
>> run
>> > > in
>> > > > parallel?
>> > > >
>> > > > Thanks.
>> > > >
>> > > > --
>> > > > -Bhupesh
>> > >
>> >
>>

Reply via email to