On Wed, Dec 20, 2017 at 2:35 PM Robert Bradshaw <rober...@google.com> wrote:
> On Tue, Dec 19, 2017 at 4:18 PM, Henning Rohde <hero...@google.com> wrote: > > Hi everyone, > > > > As part of the Go SDK development, I was looking at the guidelines for > > merging new runners and SDKs into master [1] and I think they would > benefit > > from being updated to reflect the emerging portability framework. > Specific > > suggestions: > > > > (1) Both runners and SDKs should support the portability framework (to > the > > extent the model is supported by the runner/SDK). It would be > > counter-productive at this time for the ecosystem to go against that > effort > > without a compelling reason. Direct runners not included. > > +1 > > > (2) What are the minimal set of IO connectors a new SDK must support? > Given > > the upcoming cross-language feature in the portability framework, can we > > rely on that to meet the requirement > > without implementing any native IO connectors? > > It could be argued that there needs to be enough IO to write > end-to-end examples such as WordCount and demonstrate what IOs would > look like. TextIO may satisfy this. Once we have cross-language > universal local runners, we could eschew even that (and perhaps the > requirement would be simply that it runs against that runner). > This is probably up to individual SDK authors but I feel like we should at least strongly encourage making sources framework (SDF if we are considering future) and other I/O utilities (for example source testing framework, FileIO) available for all SDKs. This will allow I/O authors to freely decide the SDK(s) where I/O is natively implemented based on criteria such as (1) availability of client libraries (2) efficiency (3) usability (4) other business requirements. Rest of the SDKs could use the I/O in the form of a cross-language transform. - Cham > > (3) Similarly to new runners, new SDKs should handle at least a useful > > subset of the model, but not necessarily the whole model (at the time of > > merge). A global-window-batch-only SDK targeting the portability > framework, > > for example, could be as useful a contribution in master as a full model > SDK > > that is supported by a direct runner only. Of course, this is not to say > > that SDKs should not strive to support the full model, but rather -- like > > Python streaming -- that it's fine to pursue that goal in master beyond a > > certain point. That said, I'm curious as to why this guideline for SDKs > was > > set that specifically originally. > > While I don't think full model completeness is a feasible goal for > merging into master (e.g. metadata driven triggers and retractions > aren't even fully fleshed out yet), there are certain core pieces of > the model that must be present, the notion of windowing among them. In > addition, event-time driven windowing is one of the distinguishing > features of Beam so it seems a regression to have SDKs without it, and > may affect design choices like how windows and timestamps are assigned > or inspected. Also, from a pragmatic point of view, accounting and > tracking the fact that each element has an associated window and > timestamp that must flow through the pipeline and taken into account > during grouping is not something that is easily bolted on later to a > global-window-batch-only SDK, and should be built in from the start, > not offered as a vague promise someone will get to > post-merge-to-master. > > I'd be OK supporting only a subset of WindowFns, but more than > GlobalWindows (equivalent to "we'll just ignore the window > everywhere") only. > > FWIW, streaming vs. batch is not part of the "model" per se, it's an > operational difference. The full model was present in the SDK before > any streaming backends were ready. > > > Finally, while portability support for various features -- such as side > > input, cross-language I/O and the reference runner -- is still underway, > > what should the guidelines be? For the Go SDK specifically, if in > master, it > > would bring the additional utility of helping test the portability > framework > > as it's being developed. On the other hand, it can't support features > that > > do not yet exist. > > Fortunately I think we have a little bit of time to get the full > portability story into place before the Go SDK is ready to be merged. > (On the note of helping development, I don't see anything that the Go > SDK could offer specifically that the Python SDK can't.) > > In short, I think the list at > https://beam.apache.org/contribute/feature-branches/ stands, with the > additional requirement of Fn API support, and on that note (3) may be > the (FnApi speaking) Reference Runner against which the IOs for (2) > could be more easily satisfied. > > One more point of consideration, we should probably have at least one > committer committed to (and in a position to) support it. > > - Robert >