Hi Beamers, Now that things are getting started and we discuss the technical vision of Beam, we would like to contribute the Flink runner and start by sharing some details about the status and the roadmap.
The Flink Runner integrates deeply and naturally with the Dataflow SDK (the Beam precursor), because the Flink DataStream API shares many concepts with the Dataflow model. Based on whether the program input is bounded or unbounded, the program goes against Flink's DataStream or DataSet API. A quick preview at some of the nice features of the runner: - Support for stream transformations, event time, watermarks - The full Dataflow windowing semantics, including fixed/sliding time windows, and session windows - Integration with Flink's streaming sources (Kafka, RabbitMQ, ...) - Batch (bounded sources) integrates fully with Flink's managed memory techniques and out-of-core algorithms, supporting huge joins and aggregations. - Integration with Flink's batch API sources (plain text, CSV, Avro, JDBC, HBase, ...) - Integration with Flink's fault tolerance - both batch and streaming program recover from failures - After upgrading the dependency to Flink 1.0, one could even use the Flink Savepoints feature (save streaming state for later resuming) with the Dataflow programs. Attached you can find the document we drew up with more information about the current state of the Runner and the roadmap for its upcoming features: https://docs.google.com/document/d/1QM_X70VvxWksAQ5C114MoAKb1d9Vzl2dLxEZM4WYogo/edit?usp=sharing The Runner executes the quasi complete Beam streaming model (well, Dataflow, actually, because Beam is not there, yet). Given the current excitement and buzz around Beam, we could add this runner to the Beam repository and link it as a "preview" for the people that want to get a feeling of what it will be like to write and run streaming (unbounded) Beam programs. That would give people something tangible until the actual Beam code is available. What do you think? Best, Max
