Interactive Beam - support for caching and introspection of PCollections

Alexey Strokach Fri, 06 Sep 2019 12:31:26 -0700

Hi everyone,

I have recently finished my internship at Google, which involved doing some
work with Apache Beam in a Jupyter Notebook environment. One limitation
that I encountered with my workflow is the lack of support for
introspecting the contents of a PCollection and excessive boilerplate
required to move data between a Beam Pipeline and the Python interpreter.


With guidance from Vanya Tarasonv and Harsh Vardhan, I have created a
design document which describes those limitations:
https://docs.google.com/document/d/1sISjl4Q60mR1V22R1UZd417wVEn_EmZT-SalTHXG4H0/
.

I also have two PRs outstanding, which add support for materializing and
accessing bounded and unbounded PCollections both from a Beam Pipeline and
from the Python interpreter.
- https://github.com/apache/beam/pull/8884
- https://github.com/apache/beam/pull/8961

I am aware of the work being carried out by +Ning Kang and +David Yan on
[Interactive Beam](
https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/),
and upon discussion, it does not appear that our PRs would conflict with
their vision.

Any feedback from the Apache Beam community would be very much appreciated
:).

Thank you,
Alexey

Interactive Beam - support for caching and introspection of PCollections

Reply via email to