Thanks for the comments!

> > (2) What are the minimal set of IO connectors a new SDK must support?
> Given
> > the upcoming cross-language feature in the portability framework, can we
> > rely on that to meet the requirement
> > without implementing any native IO connectors?
>
> It could be argued that there needs to be enough IO to write
> end-to-end examples such as WordCount and demonstrate what IOs would
> look like. TextIO may satisfy this. Once we have cross-language
> universal local runners, we could eschew even that (and perhaps the
> requirement would be simply that it runs against that runner).
>

Yes -- TextIO is something that naturally is added with a direct runner,
but with cross-language IO it may be a toy version for that purpose alone.
Real use might use a more robust version from another SDK with more
supported filesystems, for example. The Go SDK is de facto sort of leaning
towards that approach.


> > (3) Similarly to new runners, new SDKs should handle at least a useful
> > subset of the model, but not necessarily the whole model (at the time of
> > merge). A global-window-batch-only SDK targeting the portability
> framework,
> > for example, could be as useful a contribution in master as a full model
> SDK
> > that is supported by a direct runner only. Of course, this is not to say
> > that SDKs should not strive to support the full model, but rather -- like
> > Python streaming -- that it's fine to pursue that goal in master beyond a
> > certain point. That said, I'm curious as to why this guideline for SDKs
> was
> > set that specifically originally.
>
> While I don't think full model completeness is a feasible goal for
> merging into master (e.g. metadata driven triggers and retractions
> aren't even fully fleshed out yet), there are certain core pieces of
> the model that must be present, the notion of windowing among them. In
> addition, event-time driven windowing is one of the distinguishing
> features of Beam so it seems a regression to have SDKs without it, and
> may affect design choices like how windows and timestamps are assigned
> or inspected. Also, from a pragmatic point of view, accounting and
> tracking the fact that each element has an associated window and
> timestamp that must flow through the pipeline and taken into account
> during grouping is not something that is easily bolted on later to a
> global-window-batch-only SDK, and should be built in from the start,
> not offered as a vague promise someone will get to
> post-merge-to-master.
>
> I'd be OK supporting only a subset of WindowFns, but more than
> GlobalWindows (equivalent to "we'll just ignore the window
> everywhere") only.
>
> FWIW, streaming vs. batch is not part of the "model" per se, it's an
> operational difference. The full model was present in the SDK before
> any streaming backends were ready.
>

These are good points. I am more thinking about it from a viewpoint of what
makes a useful contribution. For runners, for example, the guidelines allow
for a narrower focus -- traditional batch is called out -- and I think that
makes sense for SDKs as well. In both cases, one focus or another might
make it harder to support the full model later, but that seems beyond the
scope of general guideline. The portability framework hopefully also makes
it less likely that any particular design choice is too expensive to change
later by the added isolation.


> > Finally, while portability support for various features -- such as side
> > input, cross-language I/O and the reference runner -- is still underway,
> > what should the guidelines be? For the Go SDK specifically, if in
> master, it
> > would bring the additional utility of helping test the portability
> framework
> > as it's being developed. On the other hand, it can't support features
> that
> > do not yet exist.
>
> Fortunately I think we have a little bit of time to get the full
> portability story into place before the Go SDK is ready to be merged.
> (On the note of helping development, I don't see anything that the Go
> SDK could offer specifically that the Python SDK can't.)
>

Indeed :). There is nothing specific the Go SDK would offer for helping
development other than better exercising the framework.


> In short, I think the list at
> https://beam.apache.org/contribute/feature-branches/ stands, with the
> additional requirement of Fn API support, and on that note (3) may be
> the (FnApi speaking) Reference Runner against which the IOs for (2)
> could be more easily satisfied.
>
> One more point of consideration, we should probably have at least one
> committer committed to (and in a position to) support it.
>

Makes sense, although I hadn't really thought about it for Go until now. What
would you suggest for new committer-less SDKs/runners, where a fair chunk
of the code would almost by construction be unfamiliar to committers? A
cool part of Beam portability IMO is that it opens the door for
non-mainstream languages to participate.

Reply via email to