Thanks Raghu, this is very helpful!

On Friday, October 19, 2018, Raghu Angadi <[email protected]> wrote:

> Decoding checkpoint is required only while resuming a reader. Typically
> this happens: while reopening the reader after it is closed (for any
> reason), or while restarting the pipeline with previous checkpoint, as in a
> Dataflow update, or when the work moves to a different worker, or if the
> worker restarts etc.
>
> Dataflow typically does not close its readers unless it is really
> required. Opening a reader is typically expensive. Currently a reader is
> closed when the works moves (see above) or the reader is not read from for
> 1 minute. Note that the 1 minute inactivity timeout is not same as idle
> reader without any any input. Beam keeps polling the reader for more input.
> A reader might not polled in some resource constrained pipelines or if lot
> of higher priority work gets scheduled (e.g. when a large window fires).
>
> If you want to force a reader to close and resume in your test, throw an
> IOException. Dataflow will restart the reader in streaming. Obviously this
> is only for testing.
>
> On Thu, Oct 18, 2018 at 9:32 AM flyisland <[email protected]> wrote:
>
>> Hi gurus,
>>
>> I've added some debug output in my UnboundedReader and Checkpoint
>> classes, and noticed that the Dataflow Runner keeps encoding the checkpoint
>> objects, but never decode it, and never invoke the UnboundedReader.close()
>> method in my testing.
>>
>> However, the Direct Runner will decode the checkpoint object just after
>> encode it,  and it will also invoke the UnboundedReader.close() method and
>> start new reader every now and then.
>>
>> The google Dataflow will be the future production environment, so I'd
>> like to know when will the Dataflow Runner invoke the
>> UnboundedReader.close() method and decode the checkpoint?
>>
>> Thanks in advance!
>>
>> Island Chen
>>
>

Reply via email to