Re: Removing the PValueCache from the Beam Python DirectRunner

Robert Bradshaw Thu, 25 Jan 2018 16:07:43 -0800

Sounds good. I assume there will still need to be runner-specific
support for any runner that chooses to implement this (e.g. writing to
remote files then reading them in?)


On Thu, Jan 25, 2018 at 3:25 PM, Charles Chen <[email protected]> wrote:
> Currently, the Python SDK supports an eager execution mode.  For example, a
> list can be directly passed into a PTransform to obtain its result:
>
> result = [1, 2, 3] | MyPTransform()
>
> To support this use, the Python DirectRunner has an option to cache its
> intermediate results into a PValueCache.  The above line, when run,
> implicitly creates an ephemeral pipeline and runs it with the DirectRunner.
> This, however, adds a lot of complexity to the DirectRunner, and is not
> generalizable to other in-process Python runners (like the in-process Python
> FnApiRunner, which runs batch pipelines more efficiently than the current
> Python DirectRunner).
>
> To improve this, I will be removing this DirectRunner-specific
> implementation and add functionality that allows all in-process Python
> runners to be run in eager mode.
>
> Jira issue: https://issues.apache.org/jira/browse/BEAM-3537
> Candidate fix: https://github.com/apache/beam/pull/4492
>
> Best,
> Charles

Re: Removing the PValueCache from the Beam Python DirectRunner

Reply via email to