Thanks Raghu, this is very helpful! On Friday, October 19, 2018, Raghu Angadi <[email protected]> wrote:
> Decoding checkpoint is required only while resuming a reader. Typically > this happens: while reopening the reader after it is closed (for any > reason), or while restarting the pipeline with previous checkpoint, as in a > Dataflow update, or when the work moves to a different worker, or if the > worker restarts etc. > > Dataflow typically does not close its readers unless it is really > required. Opening a reader is typically expensive. Currently a reader is > closed when the works moves (see above) or the reader is not read from for > 1 minute. Note that the 1 minute inactivity timeout is not same as idle > reader without any any input. Beam keeps polling the reader for more input. > A reader might not polled in some resource constrained pipelines or if lot > of higher priority work gets scheduled (e.g. when a large window fires). > > If you want to force a reader to close and resume in your test, throw an > IOException. Dataflow will restart the reader in streaming. Obviously this > is only for testing. > > On Thu, Oct 18, 2018 at 9:32 AM flyisland <[email protected]> wrote: > >> Hi gurus, >> >> I've added some debug output in my UnboundedReader and Checkpoint >> classes, and noticed that the Dataflow Runner keeps encoding the checkpoint >> objects, but never decode it, and never invoke the UnboundedReader.close() >> method in my testing. >> >> However, the Direct Runner will decode the checkpoint object just after >> encode it, and it will also invoke the UnboundedReader.close() method and >> start new reader every now and then. >> >> The google Dataflow will be the future production environment, so I'd >> like to know when will the Dataflow Runner invoke the >> UnboundedReader.close() method and decode the checkpoint? >> >> Thanks in advance! >> >> Island Chen >> >
