Currently, the Python SDK supports an eager execution mode.  For example, a
list can be directly passed into a PTransform to obtain its result:

result = [1, 2, 3] | MyPTransform()

To support this use, the Python DirectRunner has an option to cache its
intermediate results into a PValueCache.  The above line, when run,
implicitly creates an ephemeral pipeline and runs it with the
DirectRunner.  This, however, adds a lot of complexity to the DirectRunner,
and is not generalizable to other in-process Python runners (like the
in-process Python FnApiRunner, which runs batch pipelines more efficiently
than the current Python DirectRunner).

To improve this, I will be removing this DirectRunner-specific
implementation and add functionality that allows all in-process Python
runners to be run in eager mode.

Jira issue: https://issues.apache.org/jira/browse/BEAM-3537
Candidate fix: https://github.com/apache/beam/pull/4492

Best,
Charles

Reply via email to