Decoding checkpoint is required only while resuming a reader. Typically this happens: while reopening the reader after it is closed (for any reason), or while restarting the pipeline with previous checkpoint, as in a Dataflow update, or when the work moves to a different worker, or if the worker restarts etc.
Dataflow typically does not close its readers unless it is really required. Opening a reader is typically expensive. Currently a reader is closed when the works moves (see above) or the reader is not read from for 1 minute. Note that the 1 minute inactivity timeout is not same as idle reader without any any input. Beam keeps polling the reader for more input. A reader might not polled in some resource constrained pipelines or if lot of higher priority work gets scheduled (e.g. when a large window fires). If you want to force a reader to close and resume in your test, throw an IOException. Dataflow will restart the reader in streaming. Obviously this is only for testing. On Thu, Oct 18, 2018 at 9:32 AM flyisland <[email protected]> wrote: > Hi gurus, > > I've added some debug output in my UnboundedReader and Checkpoint classes, > and noticed that the Dataflow Runner keeps encoding the checkpoint objects, > but never decode it, and never invoke the UnboundedReader.close() method in > my testing. > > However, the Direct Runner will decode the checkpoint object just after > encode it, and it will also invoke the UnboundedReader.close() method and > start new reader every now and then. > > The google Dataflow will be the future production environment, so I'd like > to know when will the Dataflow Runner invoke the UnboundedReader.close() > method and decode the checkpoint? > > Thanks in advance! > > Island Chen >
