Hi,

The Apex runner is currently in a feature branch:

https://github.com/apache/incubator-beam/tree/apex-runner

Focus till here has been on functional completeness. It passes all the
integration tests.

Apex with its stateful stream processing architecture can support all of
the concepts in the Beam model (event time, triggers, watermarks etc.).
Most of these are already supported through the Beam SDK. The glue code
that had to be written isn't that much, which speaks to the conceptual
alignment in general.

The runner in its current form does not leverage all the performance and
scalability that Apex can deliver. We expect to address this with future
contributions, leveraging things like incremental checkpointing,
partitioning and operator affinity from Apex.

>From a code perspective, the runner should be close to what is needed for a
merge to master (based on the contribution guidelines). The following items
have been identified as prerequisite:

* Add a README.md to the runner directory that summarizes its current state
* Update the https://beam.apache.org/learn/runners/capability-matrix/ to
include the Apex info
* Create the page under learn/runners (at least the place holder)

It should also be noted that the integration tests currently take quite
long to run with embedded Apex (~50 minutes). Some of that has to do with
how completion of the tests is determined and there are ideas to improve it.

I have created some JIRAs from my TODO list of follow-up work for more
contributors to get involved:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20component%20%3D%20runner-apex

Some folks on the Apex dev list have expressed interest to take up some of
this work. And thanks to Ismaël Mejía for BEAM-815
<https://issues.apache.org/jira/browse/BEAM-815> !

I'm looking forward to your comments and suggestions.

Thanks,
Thomas

Reply via email to