Hi All!
I am happy to announce that an improved Interactive Runner is now available on master. This Python runner allows for the interactive development of Beam pipelines in a notebook (and IPython) environment. The runner still has some bugs that need to be fixed as well as some refactoring, but it is in a good enough shape to start using it. Here are the new things you can do with the Interactive Runner: - Create and execute pipelines within a REPL - Visualize elements as the pipeline is running - Materialize PCollections to DataFrames - Record unbounded sources for deterministic replay - Replay cached unbounded sources including watermark advancements The code lives in sdks/python/apache_beam/runners/interactive <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive> and example notebooks are in sdks/python/apache_beam/runners/interactive/examples <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples> . To install, use `pip install -e .[interactive]` in your <project root>/sdks/python directory. To run, here’s a quick example: ``` import apache_beam as beam from apache_beam.runners.interactive.interactive_runner import InteractiveRunner import apache_beam.runners.interactive.interactive_beam as ib p = beam.Pipeline(InteractiveRunner()) words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be']) counts = words | 'count' >> beam.combiners.Count.PerElement() # Shows a dynamically updating display of the PCollection elements ib.show(counts) # We can now visualize the data using standard pandas operations. df = ib.collect(counts) print(df.info()) print(df.describe()) # Plot the top-10 counted words df = df.sort_values(by=1, ascending=False) df.head(n=10).plot(x=0, y=1) ``` Currently, Batch is supported on any runner. Streaming is only supported on the DirectRunner (non-FnAPI). I would like to thank the great work of Sindy (@sindyli) and Harsh (@ananvay) for the initial implementation, David Yan (@davidyan) who led the project, Ning (@ningk) and myself (@srohde) for the implementation and design, and Ahmet (@altay), Daniel (@millsd), Pablo (@pabloem), and Robert (@robertwb) who all contributed a lot of their time to help with the design and code reviews. It was a team effort and we wouldn't have been able to complete it without the help of everyone involved. Regards, Sam
