On Tue, Dec 19, 2017 at 4:18 PM, Henning Rohde <hero...@google.com> wrote:
> Hi everyone,
>
>  As part of the Go SDK development, I was looking at the guidelines for
> merging new runners and SDKs into master [1] and I think they would benefit
> from being updated to reflect the emerging portability framework. Specific
> suggestions:
>
> (1) Both runners and SDKs should support the portability framework (to the
> extent the model is supported by the runner/SDK). It would be
> counter-productive at this time for the ecosystem to go against that effort
> without a compelling reason. Direct runners not included.

+1

> (2) What are the minimal set of IO connectors a new SDK must support? Given
> the upcoming cross-language feature in the portability framework, can we
> rely on that to meet the requirement
> without implementing any native IO connectors?

It could be argued that there needs to be enough IO to write
end-to-end examples such as WordCount and demonstrate what IOs would
look like. TextIO may satisfy this. Once we have cross-language
universal local runners, we could eschew even that (and perhaps the
requirement would be simply that it runs against that runner).

> (3) Similarly to new runners, new SDKs should handle at least a useful
> subset of the model, but not necessarily the whole model (at the time of
> merge). A global-window-batch-only SDK targeting the portability framework,
> for example, could be as useful a contribution in master as a full model SDK
> that is supported by a direct runner only. Of course, this is not to say
> that SDKs should not strive to support the full model, but rather -- like
> Python streaming -- that it's fine to pursue that goal in master beyond a
> certain point. That said, I'm curious as to why this guideline for SDKs was
> set that specifically originally.

While I don't think full model completeness is a feasible goal for
merging into master (e.g. metadata driven triggers and retractions
aren't even fully fleshed out yet), there are certain core pieces of
the model that must be present, the notion of windowing among them. In
addition, event-time driven windowing is one of the distinguishing
features of Beam so it seems a regression to have SDKs without it, and
may affect design choices like how windows and timestamps are assigned
or inspected. Also, from a pragmatic point of view, accounting and
tracking the fact that each element has an associated window and
timestamp that must flow through the pipeline and taken into account
during grouping is not something that is easily bolted on later to a
global-window-batch-only SDK, and should be built in from the start,
not offered as a vague promise someone will get to
post-merge-to-master.

I'd be OK supporting only a subset of WindowFns, but more than
GlobalWindows (equivalent to "we'll just ignore the window
everywhere") only.

FWIW, streaming vs. batch is not part of the "model" per se, it's an
operational difference. The full model was present in the SDK before
any streaming backends were ready.

> Finally, while portability support for various features -- such as side
> input, cross-language I/O and the reference runner -- is still underway,
> what should the guidelines be? For the Go SDK specifically, if in master, it
> would bring the additional utility of helping test the portability framework
> as it's being developed. On the other hand, it can't support features that
> do not yet exist.

Fortunately I think we have a little bit of time to get the full
portability story into place before the Go SDK is ready to be merged.
(On the note of helping development, I don't see anything that the Go
SDK could offer specifically that the Python SDK can't.)

In short, I think the list at
https://beam.apache.org/contribute/feature-branches/ stands, with the
additional requirement of Fn API support, and on that note (3) may be
the (FnApi speaking) Reference Runner against which the IOs for (2)
could be more easily satisfied.

One more point of consideration, we should probably have at least one
committer committed to (and in a position to) support it.

- Robert

Reply via email to