Yes, that is correct. The scope of the attached fix is for in-process runners. For remote runners, we should think about how to make PCollection contents available after pipeline execution. We may also need to better design eager / interactive execution for that use case, since our current use of eager mode is geared towards testing transforms locally.
On Thu, Jan 25, 2018 at 4:07 PM Robert Bradshaw <[email protected]> wrote: > Sounds good. I assume there will still need to be runner-specific > support for any runner that chooses to implement this (e.g. writing to > remote files then reading them in?) > > On Thu, Jan 25, 2018 at 3:25 PM, Charles Chen <[email protected]> wrote: > > Currently, the Python SDK supports an eager execution mode. For > example, a > > list can be directly passed into a PTransform to obtain its result: > > > > result = [1, 2, 3] | MyPTransform() > > > > To support this use, the Python DirectRunner has an option to cache its > > intermediate results into a PValueCache. The above line, when run, > > implicitly creates an ephemeral pipeline and runs it with the > DirectRunner. > > This, however, adds a lot of complexity to the DirectRunner, and is not > > generalizable to other in-process Python runners (like the in-process > Python > > FnApiRunner, which runs batch pipelines more efficiently than the current > > Python DirectRunner). > > > > To improve this, I will be removing this DirectRunner-specific > > implementation and add functionality that allows all in-process Python > > runners to be run in eager mode. > > > > Jira issue: https://issues.apache.org/jira/browse/BEAM-3537 > > Candidate fix: https://github.com/apache/beam/pull/4492 > > > > Best, > > Charles >
