Currently, the Python SDK supports an eager execution mode. For example, a list can be directly passed into a PTransform to obtain its result:
result = [1, 2, 3] | MyPTransform() To support this use, the Python DirectRunner has an option to cache its intermediate results into a PValueCache. The above line, when run, implicitly creates an ephemeral pipeline and runs it with the DirectRunner. This, however, adds a lot of complexity to the DirectRunner, and is not generalizable to other in-process Python runners (like the in-process Python FnApiRunner, which runs batch pipelines more efficiently than the current Python DirectRunner). To improve this, I will be removing this DirectRunner-specific implementation and add functionality that allows all in-process Python runners to be run in eager mode. Jira issue: https://issues.apache.org/jira/browse/BEAM-3537 Candidate fix: https://github.com/apache/beam/pull/4492 Best, Charles
