Thanks Alexey! The materialization of PCollection data directly from cache
instead of going through the pipeline result would be very helpful for what
we want to achieve!

On Fri, Sep 6, 2019 at 12:31 PM Alexey Strokach <[email protected]> wrote:

> Hi everyone,
>
> I have recently finished my internship at Google, which involved doing
> some work with Apache Beam in a Jupyter Notebook environment. One
> limitation that I encountered with my workflow is the lack of support for
> introspecting the contents of a PCollection and excessive boilerplate
> required to move data between a Beam Pipeline and the Python interpreter.
>
> With guidance from Vanya Tarasonv and Harsh Vardhan, I have created a
> design document which describes those limitations:
> https://docs.google.com/document/d/1sISjl4Q60mR1V22R1UZd417wVEn_EmZT-SalTHXG4H0/
> .
>
> I also have two PRs outstanding, which add support for materializing and
> accessing bounded and unbounded PCollections both from a Beam Pipeline and
> from the Python interpreter.
> - https://github.com/apache/beam/pull/8884
> - https://github.com/apache/beam/pull/8961
>
> I am aware of the work being carried out by +Ning Kang and +David Yan on
> [Interactive Beam](
> https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/),
> and upon discussion, it does not appear that our PRs would conflict with
> their vision.
>
> Any feedback from the Apache Beam community would be very much appreciated
> :).
>
> Thank you,
> Alexey
>
>
>
>
>

Reply via email to