On Wed, Dec 20, 2017 at 2:35 PM Robert Bradshaw <rober...@google.com> wrote:

> On Tue, Dec 19, 2017 at 4:18 PM, Henning Rohde <hero...@google.com> wrote:
> > Hi everyone,
> >
> >  As part of the Go SDK development, I was looking at the guidelines for
> > merging new runners and SDKs into master [1] and I think they would
> benefit
> > from being updated to reflect the emerging portability framework.
> Specific
> > suggestions:
> >
> > (1) Both runners and SDKs should support the portability framework (to
> the
> > extent the model is supported by the runner/SDK). It would be
> > counter-productive at this time for the ecosystem to go against that
> effort
> > without a compelling reason. Direct runners not included.
>
> +1
>
> > (2) What are the minimal set of IO connectors a new SDK must support?
> Given
> > the upcoming cross-language feature in the portability framework, can we
> > rely on that to meet the requirement
> > without implementing any native IO connectors?
>
> It could be argued that there needs to be enough IO to write
> end-to-end examples such as WordCount and demonstrate what IOs would
> look like. TextIO may satisfy this. Once we have cross-language
> universal local runners, we could eschew even that (and perhaps the
> requirement would be simply that it runs against that runner).
>

This is probably up to individual SDK authors but I feel like we should at
least strongly encourage making sources framework (SDF if we are
considering future) and other I/O utilities (for example source testing
framework, FileIO) available for all SDKs. This will allow I/O authors to
freely decide the SDK(s) where I/O is natively implemented based on
criteria such as (1) availability of client libraries (2) efficiency (3)
usability (4) other business requirements. Rest of the SDKs could use the
I/O in the form of a cross-language transform.

- Cham


> > (3) Similarly to new runners, new SDKs should handle at least a useful
> > subset of the model, but not necessarily the whole model (at the time of
> > merge). A global-window-batch-only SDK targeting the portability
> framework,
> > for example, could be as useful a contribution in master as a full model
> SDK
> > that is supported by a direct runner only. Of course, this is not to say
> > that SDKs should not strive to support the full model, but rather -- like
> > Python streaming -- that it's fine to pursue that goal in master beyond a
> > certain point. That said, I'm curious as to why this guideline for SDKs
> was
> > set that specifically originally.
>
> While I don't think full model completeness is a feasible goal for
> merging into master (e.g. metadata driven triggers and retractions
> aren't even fully fleshed out yet), there are certain core pieces of
> the model that must be present, the notion of windowing among them. In
> addition, event-time driven windowing is one of the distinguishing
> features of Beam so it seems a regression to have SDKs without it, and
> may affect design choices like how windows and timestamps are assigned
> or inspected. Also, from a pragmatic point of view, accounting and
> tracking the fact that each element has an associated window and
> timestamp that must flow through the pipeline and taken into account
> during grouping is not something that is easily bolted on later to a
> global-window-batch-only SDK, and should be built in from the start,
> not offered as a vague promise someone will get to
> post-merge-to-master.
>
> I'd be OK supporting only a subset of WindowFns, but more than
> GlobalWindows (equivalent to "we'll just ignore the window
> everywhere") only.
>
> FWIW, streaming vs. batch is not part of the "model" per se, it's an
> operational difference. The full model was present in the SDK before
> any streaming backends were ready.
>
> > Finally, while portability support for various features -- such as side
> > input, cross-language I/O and the reference runner -- is still underway,
> > what should the guidelines be? For the Go SDK specifically, if in
> master, it
> > would bring the additional utility of helping test the portability
> framework
> > as it's being developed. On the other hand, it can't support features
> that
> > do not yet exist.
>
> Fortunately I think we have a little bit of time to get the full
> portability story into place before the Go SDK is ready to be merged.
> (On the note of helping development, I don't see anything that the Go
> SDK could offer specifically that the Python SDK can't.)
>
> In short, I think the list at
> https://beam.apache.org/contribute/feature-branches/ stands, with the
> additional requirement of Fn API support, and on that note (3) may be
> the (FnApi speaking) Reference Runner against which the IOs for (2)
> could be more easily satisfied.
>
> One more point of consideration, we should probably have at least one
> committer committed to (and in a position to) support it.
>
> - Robert
>

Reply via email to