Hi Beamers --

I’m thrilled by the recent energy and activity on writing new Beam runners!
But that also means it’s probably time for us to figure out how, as a
community, we want to support this process. ;-)

Back near the beginning, we had a thread [1] discussing that feature
branches are the preferred way of doing development of features or
components that may take a while to reach maturity. I think new components
like runners and SDKs meet the bar to be started from a feature branch.
(Other features, like an IO connector or library of PTransforms, might also
qualify depending on complexity.)

We should also lay out what it takes to be considered mature enough to be
merged into master, since once that happens the component gets released to
users and failing tests become blocking issues. Here are some initial
thoughts to kick off the discussion...

In order to be merged into master, new components / major features should:

   -

   have at least 2 contributors interested in maintaining it, and 1
   committer interested in supporting it
   -

   provide both end-user and developer-facing documentation
   -

   have at least a basic level of unit test coverage
   -

   run all existing applicable integration tests with other Beam components
   and create additional tests as appropriate


In addition...

A runner should:

   -

   be able to handle a subset of the model that address a significant set
   of use cases (aka. ‘traditional batch’ or ‘processing time streaming’)
   -

   update the capability matrix with the current status


An SDK* should:

   -

   provide the ability to construct graphs with all the basic building
   blocks of the model (ParDo, GroupByKey, Window, Trigger, etc)
   -

   begin fleshing out the common composite transforms (Count, Join, etc)
   and IO connectors (Text, Kafka, etc)
   -

   have at least one runner that can execute the complete model (may be a
   direct runner)
   -

   provide integration tests for executing against current and future
   runners


* A note on DSLs:  I think it’s important to separate out an SDK from a
DSL, because in my mind the former is by definition equivalent to the Beam
model, while the latter may select portions of the model or change the
user-visible abstractions in order to provide a domain-specific experience.
We may want to encourage some DSLs to live separately from Beam because
they may look completely non-Beam-like to their end users. But we can
probably punt this decision until we have concrete examples to discuss.

Another fun part of this growth is that we’ll likely grow new committers.
And given the breadth of Beam, I think it would be useful to annotate our
committers [2] page with which components folks are the most knowledgeable
about.

Looking forward to your thoughts.

[1]
http://mail-archives.apache.org/mod_mbox/incubator-beam-dev/201602.mbox/%3CCAAzyFAymVNpjQgZdz2BoMknnE3H9rYRbdnUemamt9Pavw8ugsw%40mail.gmail.com%3E

[2] http://beam.incubator.apache.org/team/

Reply via email to