Re: [DISCUSS] Guidelines for merging new runners and SDKs into master

Lukasz Cwik Thu, 21 Dec 2017 09:41:01 -0800

I would add that we should really be looking for test suites and test
utilities when a new runner/SDK is merged. The tests really drive the
majority of our validation and really help reduce maintenance burden. I'm
thinking of the suite of ValidatesRunner tests that live within the Java
and Python SDKs. The validation really goes both ways, it finds bugs in
execution assumptions on the Runner and SDK sides. Also concepts like
PAssert, TestStream, DoFnTester, source testing utilities, coder testing
utilities, ... are very useful as well.


On Wed, Dec 20, 2017 at 9:48 PM, Robert Bradshaw <rober...@google.com>
wrote:

> On Wed, Dec 20, 2017 at 6:45 PM, Henning Rohde <hero...@google.com> wrote:
> >> > (3) Similarly to new runners, new SDKs should handle at least a useful
> >> > subset of the model, but not necessarily the whole model (at the time
> of
> >> > merge). A global-window-batch-only SDK targeting the portability
> >> > framework,
> >> > for example, could be as useful a contribution in master as a full
> model
> >> > SDK
> >> > that is supported by a direct runner only. Of course, this is not to
> say
> >> > that SDKs should not strive to support the full model, but rather --
> >> > like
> >> > Python streaming -- that it's fine to pursue that goal in master
> beyond
> >> > a
> >> > certain point. That said, I'm curious as to why this guideline for
> SDKs
> >> > was
> >> > set that specifically originally.
> >>
> >> While I don't think full model completeness is a feasible goal for
> >> merging into master (e.g. metadata driven triggers and retractions
> >> aren't even fully fleshed out yet), there are certain core pieces of
> >> the model that must be present, the notion of windowing among them. In
> >> addition, event-time driven windowing is one of the distinguishing
> >> features of Beam so it seems a regression to have SDKs without it, and
> >> may affect design choices like how windows and timestamps are assigned
> >> or inspected. Also, from a pragmatic point of view, accounting and
> >> tracking the fact that each element has an associated window and
> >> timestamp that must flow through the pipeline and taken into account
> >> during grouping is not something that is easily bolted on later to a
> >> global-window-batch-only SDK, and should be built in from the start,
> >> not offered as a vague promise someone will get to
> >> post-merge-to-master.
> >>
> >> I'd be OK supporting only a subset of WindowFns, but more than
> >> GlobalWindows (equivalent to "we'll just ignore the window
> >> everywhere") only.
> >>
> >> FWIW, streaming vs. batch is not part of the "model" per se, it's an
> >> operational difference. The full model was present in the SDK before
> >> any streaming backends were ready.
> >
> > These are good points. I am more thinking about it from a viewpoint of
> what
> > makes a useful contribution. For runners, for example, the guidelines
> allow
> > for a narrower focus -- traditional batch is called out -- and I think
> that
> > makes sense for SDKs as well. In both cases, one focus or another might
> make
> > it harder to support the full model later, but that seems beyond the
> scope
> > of general guideline. The portability framework hopefully also makes it
> less
> > likely that any particular design choice is too expensive to change
> later by
> > the added isolation.
>
> Some runners are batch only, some runners are streaming only, but all
> support windowing. Fortunately the portability framework makes it
> easier to support this. To not have windowing at all is a too sever
> omission in model in my book, and reflects a significant omission in
> the API and likely implementation as well.
>
> If it's easy to add, it's not a big hurdle. If it's difficult to
> retrofit after the fact, all the more reason to get it in early :).
>
> >> One more point of consideration, we should probably have at least one
> >> committer committed to (and in a position to) support it.
> >
> > Makes sense, although I hadn't really thought about it for Go until now.
> > What would you suggest for new committer-less SDKs/runners, where a fair
> > chunk of the code would almost by construction be unfamiliar to
> committers?
> > A cool part of Beam portability IMO is that it opens the door for
> > non-mainstream languages to participate.
>
> That's a good question. I would hope that by the time an SDK becomes
> mature enough to merge, at least some of the participants have become
> involved enough with the community to merit being committers.
>
> - Robert
>

Re: [DISCUSS] Guidelines for merging new runners and SDKs into master

Reply via email to