Great work! This will also be super handy for demoing Beam. Looking forward to playing around with this :)
On 19.03.20 00:52, Kenneth Knowles wrote: > Nice! > > On Wed, Mar 18, 2020 at 2:58 PM Ahmet Altay <[email protected] > <mailto:[email protected]>> wrote: > > Great to see this progress! :) > > On Wed, Mar 18, 2020 at 2:57 PM Reza Rokni <[email protected] > <mailto:[email protected]>> wrote: > > Awesome ! > > On Thu, 19 Mar 2020, 05:38 Sam Rohde, <[email protected] > <mailto:[email protected]>> wrote: > > Hi All! > > > > I am happy to announce that an improved Interactive Runner > is now available on master. This Python runner allows for > the interactive development of Beam pipelines in a notebook > (and IPython) environment. > > > > The runner still has some bugs that need to be fixed as well > as some refactoring, but it is in a good enough shape to > start using it. > > > > Here are the new things you can do with the Interactive Runner: > > * > > Create and execute pipelines within a REPL > > * > > Visualize elements as the pipeline is running > > * > > Materialize PCollections to DataFrames > > * > > Record unbounded sources for deterministic replay > > * > > Replay cached unbounded sources including watermark > advancements > > The code lives insdks/python/apache_beam/runners/interactive > > <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive>and > example notebooks are > insdks/python/apache_beam/runners/interactive/examples > > <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive/examples>. > > > > To install, use `pip install -e .[interactive]` in your > <project root>/sdks/python directory. > > To run, here’s a quick example: > > ``` > > import apache_beam as beam > > from apache_beam.runners.interactive.interactive_runner > import InteractiveRunner > > import apache_beam.runners.interactive.interactive_beam as ib > > > > p = beam.Pipeline(InteractiveRunner()) > > words = p | beam.Create(['To', 'be', 'or', 'not', 'to', 'be']) > > counts = words | 'count' >> beam.combiners.Count.PerElement() > > > > # Shows a dynamically updating display of the PCollection > elements > > ib.show(counts) > > > > # We can now visualize the data using standard pandas > operations. > > df = ib.collect(counts) > > print(df.info <http://df.info>()) > > print(df.describe()) > > > > # Plot the top-10 counted words > > df = df.sort_values(by=1, ascending=False) > > df.head(n=10).plot(x=0, y=1) > > ``` > > > > Currently, Batch is supported on any runner. Streaming is > only supported on the DirectRunner (non-FnAPI). > > > > I would like to thank the great work of Sindy (@sindyli) and > Harsh (@ananvay) for the initial implementation, > > David Yan (@davidyan) who led the project, Ning (@ningk) and > myself (@srohde) for the implementation and design, and > Ahmet (@altay), Daniel (@millsd), Pablo (@pabloem), and > Robert (@robertwb) who all contributed a lot of their time > to help with the design and code reviews. > > > > It was a team effort and we wouldn't have been able to > complete it without the help of everyone involved. > > > > Regards, > > Sam > >
