Hi everyone, I have recently finished my internship at Google, which involved doing some work with Apache Beam in a Jupyter Notebook environment. One limitation that I encountered with my workflow is the lack of support for introspecting the contents of a PCollection and excessive boilerplate required to move data between a Beam Pipeline and the Python interpreter.
With guidance from Vanya Tarasonv and Harsh Vardhan, I have created a design document which describes those limitations: https://docs.google.com/document/d/1sISjl4Q60mR1V22R1UZd417wVEn_EmZT-SalTHXG4H0/ . I also have two PRs outstanding, which add support for materializing and accessing bounded and unbounded PCollections both from a Beam Pipeline and from the Python interpreter. - https://github.com/apache/beam/pull/8884 - https://github.com/apache/beam/pull/8961 I am aware of the work being carried out by +Ning Kang and +David Yan on [Interactive Beam]( https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/), and upon discussion, it does not appear that our PRs would conflict with their vision. Any feedback from the Apache Beam community would be very much appreciated :). Thank you, Alexey
