Decoding checkpoint is required only while resuming a reader. Typically
this happens: while reopening the reader after it is closed (for any
reason), or while restarting the pipeline with previous checkpoint, as in a
Dataflow update, or when the work moves to a different worker, or if the
worker restarts etc.

Dataflow typically does not close its readers unless it is really required.
Opening a reader is typically expensive. Currently a reader is closed when
the works moves (see above) or the reader is not read from for 1 minute.
Note that the 1 minute inactivity timeout is not same as idle reader
without any any input. Beam keeps polling the reader for more input. A
reader might not polled in some resource constrained pipelines or if lot of
higher priority work gets scheduled (e.g. when a large window fires).

If you want to force a reader to close and resume in your test, throw an
IOException. Dataflow will restart the reader in streaming. Obviously this
is only for testing.

On Thu, Oct 18, 2018 at 9:32 AM flyisland <[email protected]> wrote:

> Hi gurus,
>
> I've added some debug output in my UnboundedReader and Checkpoint classes,
> and noticed that the Dataflow Runner keeps encoding the checkpoint objects,
> but never decode it, and never invoke the UnboundedReader.close() method in
> my testing.
>
> However, the Direct Runner will decode the checkpoint object just after
> encode it,  and it will also invoke the UnboundedReader.close() method and
> start new reader every now and then.
>
> The google Dataflow will be the future production environment, so I'd like
> to know when will the Dataflow Runner invoke the UnboundedReader.close()
> method and decode the checkpoint?
>
> Thanks in advance!
>
> Island Chen
>

Reply via email to