Sounds good. I assume there will still need to be runner-specific support for any runner that chooses to implement this (e.g. writing to remote files then reading them in?)
On Thu, Jan 25, 2018 at 3:25 PM, Charles Chen <[email protected]> wrote: > Currently, the Python SDK supports an eager execution mode. For example, a > list can be directly passed into a PTransform to obtain its result: > > result = [1, 2, 3] | MyPTransform() > > To support this use, the Python DirectRunner has an option to cache its > intermediate results into a PValueCache. The above line, when run, > implicitly creates an ephemeral pipeline and runs it with the DirectRunner. > This, however, adds a lot of complexity to the DirectRunner, and is not > generalizable to other in-process Python runners (like the in-process Python > FnApiRunner, which runs batch pipelines more efficiently than the current > Python DirectRunner). > > To improve this, I will be removing this DirectRunner-specific > implementation and add functionality that allows all in-process Python > runners to be run in eager mode. > > Jira issue: https://issues.apache.org/jira/browse/BEAM-3537 > Candidate fix: https://github.com/apache/beam/pull/4492 > > Best, > Charles
