Thanks for the update and I agree with the points that you have made. On Fri, Jan 10, 2020 at 5:58 PM Robert Burke <[email protected]> wrote:
> Thank you for sharing Daniel! > > Resolving SplittableDoFns for the Go SDK even just as far as initial > splitting will take the SDK that much closer to exiting its experimental > status. > > It's especially exciting seeing this work on Flink and on the Python > direct runner! > > On Fri, Jan 10, 2020, 5:36 PM Daniel Oliveira <[email protected]> > wrote: > >> Hey Beam devs, >> >> So several months ago I posted my Go SDF proposal and got a lot of good >> feedback (thread >> <https://lists.apache.org/thread.html/327bc72a0b30e18c6152b562bac2613c0edc942465d67b215830819e%40%3Cdev.beam.apache.org%3E>, >> doc <https://s.apache.org/beam-go-sdf>). Since then I've been working on >> implementing it and I've got an initial prototype ready to show off! It >> works with initial splitting on Flink, and has a decently documented API. >> Also in the second part of the email I'll also be proposing changes to the >> original doc, based on my experience working on this prototype. >> >> To be clear, this is *not* ready to officially go into Beam yet; the API >> is still likely to go through changes. Rather, I'm showing this off to show >> that progress is being made on SDFs, and to provide some context to the >> changes I'll be proposing below. >> >> Here's a link to the repo and branch so you can download it, and a link >> to the changes specifically: >> Repo: https://github.com/youngoli/beam/tree/gosdf >> Changes: >> https://github.com/apache/beam/commit/28140ee3471d6cb80e74a16e6fd108cc380d4831 >> >> If you give it a try and have any thoughts, please let me know! I'm open >> to any and all feedback. >> >> ================================== >> >> Proposed Changes >> Doc: https://s.apache.org/beam-go-sdf (Select "Version 1" from version >> history.) >> >> For anyone reading this who hasn't already read the doc above, I suggest >> reading it first, since I'll be referring to concepts from it. >> >> After working on the prototype I've changed my mind on the original >> decisions to go with an interface approach and a combined restriction + >> tracker. But I don't want to go all in and create another doc with a >> detailed proposal, so I've laid out a brief summary of the changes to get >> some initial feedback before I go ahead and start working on these changes >> in detail. Please let me know what you think! >> >> *1. Change from native Go interfaces to dynamic reflection-based API.* >> >> Instead of the native Go interfaces (SplittableDoFn, RProvider, and >> RTracker) described in the doc and implemented in the prototype, use the >> same dynamic approach that the Go SDK already uses for DoFns: Use the >> reflection system to examine the names and signatures of methods in the >> user's DoFn, RProvider, and RTracker. >> >> Original approach reasoning: >> >> - Simpler, so faster to implement and less bug-prone. >> - The extra burden on the user to keep types consistent is ok since >> most users of SDFs are more advanced >> >> Change reasoning: >> >> - In the prototype, I found interfaces to require too much extra >> boilerplate which added more complexity than expected. (Examples: Constant >> casting, >> - More consistent API: Inconsistency between regular DoFns (dynamic) >> and SDF API (interfaces) was jarring and unintuitive when implementing >> SDFs >> as a user. >> >> Implementation: Full details are up for discussion, but the goal is to >> make the RProvider and RTracker interfaces dynamic, so we can replace all >> instances of interface{} in the methods with the actual element types >> (i.e. fake generics). Also uses of the RProvider and RTracker interfaces in >> signatures can be replaced with the implementations of those >> providers/trackers. This will require a good amount of additional work in >> the DoFn validation codebase and the code generator. Plus a fair amount of >> additional user code validation will be needed and more testing since the >> new code is more complex. >> >> *2. Seperate the restriction tracker and restriction.* >> >> Currently the API has the restriction combined with the tracker. In most >> other SDKs and within the SDF model, the two are usually separate concepts, >> and this change is to follow that approach and split the two. >> >> Original approach reasoning: >> >> - It was considered simpler to avoid another level of type casting in >> the API with the interface approach. >> >> Change reasoning: >> >> - We are no longer going with the interface approach. With "fake >> generics", it is simpler to keep the two concepts separate. >> - Requiring users to specify custom coders in order to only encode >> the restriction and not the tracker ended up adding additional complexity >> anyway. >> >> Implementation: In the API have the restriction tracker initialized with >> a restriction object accessible via a getter. The restriction itself will >> be the only thing serialized, so it will be wrapped and unwrapped with the >> tracker before the user code is invoked. This wouldn't add very little work >> as it would mostly be bundled with the interface->dynamic approach change. >> >> >> Thanks, >> Daniel Oliveira >> >
