On Tue, Dec 19, 2017 at 4:18 PM, Henning Rohde <hero...@google.com> wrote: > Hi everyone, > > As part of the Go SDK development, I was looking at the guidelines for > merging new runners and SDKs into master [1] and I think they would benefit > from being updated to reflect the emerging portability framework. Specific > suggestions: > > (1) Both runners and SDKs should support the portability framework (to the > extent the model is supported by the runner/SDK). It would be > counter-productive at this time for the ecosystem to go against that effort > without a compelling reason. Direct runners not included.
+1 > (2) What are the minimal set of IO connectors a new SDK must support? Given > the upcoming cross-language feature in the portability framework, can we > rely on that to meet the requirement > without implementing any native IO connectors? It could be argued that there needs to be enough IO to write end-to-end examples such as WordCount and demonstrate what IOs would look like. TextIO may satisfy this. Once we have cross-language universal local runners, we could eschew even that (and perhaps the requirement would be simply that it runs against that runner). > (3) Similarly to new runners, new SDKs should handle at least a useful > subset of the model, but not necessarily the whole model (at the time of > merge). A global-window-batch-only SDK targeting the portability framework, > for example, could be as useful a contribution in master as a full model SDK > that is supported by a direct runner only. Of course, this is not to say > that SDKs should not strive to support the full model, but rather -- like > Python streaming -- that it's fine to pursue that goal in master beyond a > certain point. That said, I'm curious as to why this guideline for SDKs was > set that specifically originally. While I don't think full model completeness is a feasible goal for merging into master (e.g. metadata driven triggers and retractions aren't even fully fleshed out yet), there are certain core pieces of the model that must be present, the notion of windowing among them. In addition, event-time driven windowing is one of the distinguishing features of Beam so it seems a regression to have SDKs without it, and may affect design choices like how windows and timestamps are assigned or inspected. Also, from a pragmatic point of view, accounting and tracking the fact that each element has an associated window and timestamp that must flow through the pipeline and taken into account during grouping is not something that is easily bolted on later to a global-window-batch-only SDK, and should be built in from the start, not offered as a vague promise someone will get to post-merge-to-master. I'd be OK supporting only a subset of WindowFns, but more than GlobalWindows (equivalent to "we'll just ignore the window everywhere") only. FWIW, streaming vs. batch is not part of the "model" per se, it's an operational difference. The full model was present in the SDK before any streaming backends were ready. > Finally, while portability support for various features -- such as side > input, cross-language I/O and the reference runner -- is still underway, > what should the guidelines be? For the Go SDK specifically, if in master, it > would bring the additional utility of helping test the portability framework > as it's being developed. On the other hand, it can't support features that > do not yet exist. Fortunately I think we have a little bit of time to get the full portability story into place before the Go SDK is ready to be merged. (On the note of helping development, I don't see anything that the Go SDK could offer specifically that the Python SDK can't.) In short, I think the list at https://beam.apache.org/contribute/feature-branches/ stands, with the additional requirement of Fn API support, and on that note (3) may be the (FnApi speaking) Reference Runner against which the IOs for (2) could be more easily satisfied. One more point of consideration, we should probably have at least one committer committed to (and in a position to) support it. - Robert