We recently checked in the last few changes needed to support streaming
pipelines on the Beam Python DirectRunner (BEAM-1265
<https://issues.apache.org/jira/browse/BEAM-1265>).  As of HEAD (1-2 weeks
ago) and the 2.1.0 RC, Python SDK users can now write their pipelines in
streaming mode and run them locally on their own machine.

Check out the streaming wordcount example here (streaming_wordcount.py
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/streaming_wordcount.py>)
and please kick the tires, try out the new functionality and report any
bugs you may encounter.  Use the "--streaming" PipelineOption to enable
this new functionality.

Currently, the I/Os supported are the TestStream
<https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/test_stream.py>
and
Google Cloud PubSub I/O.  Chamikara is working on implementing
SplittableDoFn as the Python streaming source API so that it will be easy
to write new streaming sources.  Python streaming support for other runners
like Cloud Dataflow and Flink will be provided through the FnAPI (please
contact me if you would be interested in joining the Python Streaming Alpha
for Google Cloud Dataflow).

For reference, here are some of the relevant PRs checked in for this effort:

https://github.com/apache/beam/pull/3318
https://github.com/apache/beam/pull/3362
https://github.com/apache/beam/pull/3370
https://github.com/apache/beam/pull/3405
https://github.com/apache/beam/pull/3409
https://github.com/apache/beam/pull/3440
https://github.com/apache/beam/pull/3444
https://github.com/apache/beam/pull/3499

Best,
Charles

Reply via email to