Huge +1

This is definitely something many people have asked about, so it is
great to see it finally happening.

On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <[email protected]> wrote:
>
> +1 awesome
>
> On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <[email protected]> wrote:
>>
>> Sounds reasonable to me. I agree. We'll aim to get those (Go modules and 
>> LICENSE issue) done before the 2.32 cut, and certainly before the 2.33 cut 
>> if release images aren't added to the 2.32 process.
>>
>> Regarding Go Generics: at some point in the future, we may want a harder 
>> break between a newer Generic first API and and the current version, but 
>> there's no rush. Generics/TypeParameters in Go aren't identical to the 
>> feature referred to by that term in Java, C++, Rust, etc, so it'll take a 
>> bit of time for that expertise to develop.
>>
>> However, by the current nature of Go, we had to have pretty sophisticated 
>> reflective analysis to handle DoFns and map them to their graph inputs. So, 
>> adding new helpers like a KV, emitter, and Iterator types, shouldn't be too 
>> difficult. Changing Go SDK internals to use generics (like the 
>> implementation of Stats DoFns like Min, Max, etc) would also be able to be 
>> made transparently to most users, and certainly any of the framework for 
>> execution time handling (the "worker's SDK harness") would be able to be 
>> cleaned up if need be. Finally, adding more sophisticated DoFn registration 
>> and code generation would be able to replace the optional code generator 
>> entirely, saving some users a `go generate` step, simplifying getting 
>> improved execution performance.
>>
>> Changing things like making a Type Parameterized PCollection, would be far 
>> more involved, as would trying to use some kind of Apply format. The lack of 
>> Method Overrides prevents the apply chaining approach. Or at least prevents 
>> it from working simply.
>>
>> Finally, Go Generics won't be available until Go 1.18, which isn't until 
>> next year. See https://blog.golang.org/generics-proposal for details.
>>
>> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register calling 
>> convention, leading to a modest performance improvement across the board.
>>
>> Cheers,
>> Robert Burke
>>
>> On 2021/06/15 18:10:46, Robert Bradshaw <[email protected]> wrote:
>> > +1 to declaring Golang support out of experimental once the Go Modules
>> > issues are solved. I don't think an SDK needs to support every feature
>> > to be accepted, especially now that we can do cross-language
>> > transforms, and Go definitely supports enough to be quite useful. (WRT
>> > streaming, my understanding is that Go supports the streaming model
>> > with windows and timestamps, and runs fine on a streaming runner, even
>> > if more advanced features like state and timers aren't yet available.)
>> >
>> > This is a great milestone.
>> >
>> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <[email protected]> wrote:
>> > >
>> > > WOW! Big news.
>> > >
>> > > I'm supportive of leaving experimental status after Go Modules are 
>> > > completed and the LICENSE issue is resolved. I don't think that lacking 
>> > > streaming support is a blocker. The other thing I checked to see was if 
>> > > there were metrics available on metrics.beam.apache.org, specifically 
>> > > for measuring code health via post-commit over time, which there are and 
>> > > the passing test rate is high (Huzzah!). The one thing that surprised me 
>> > > from your summary is that when Go introduces generics it won't result in 
>> > > any backwards incompatible changes in Apache Beam. That's great news, 
>> > > but does it mean there will be a need to support both non-generic and 
>> > > generic APIs moving forward? It seems like generics will be introduced 
>> > > in the Go 1.17 release (optimistically) in August this year.
>> > >
>> > >
>> > >
>> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <[email protected]> wrote:
>> > >>
>> > >> Hello Beam Community!
>> > >>
>> > >> I propose we stop calling the Apache Beam Go SDK experimental.
>> > >>
>> > >> This thread is to discuss it as a community, and any conditions that 
>> > >> remain that would prevent the exit.
>> > >>
>> > >> tl;dr;
>> > >> Ask Questions for answers and links! I have both.
>> > >> This entails including it officially in the Release process, removing 
>> > >> the various "experimental" text throughout the repo etc,
>> > >> and otherwise treating it like Python and Java. Some Go specific tasks 
>> > >> around dep versioning.
>> > >>
>> > >> The Go SDK implements the beam model efficiently for most batch tasks, 
>> > >> including basic windowing.
>> > >> Apache Beam Go jobs can execute, and are tested on all Portable runners.
>> > >> The core APIs are not going to change in incompatible ways going 
>> > >> forward.
>> > >> Scalable transforms can be written through SplittableDoFns or via Cross 
>> > >> Language transforms.
>> > >>
>> > >> The SDK isn't 100% feature complete, but keeping it experimental 
>> > >> doesn't help with that any further.
>> > >> Communities grow through contributions and use, and experimental 
>> > >> markers dissuade users.
>> > >> There's plenty to do in order expand what can be done with the SDK. 
>> > >> (Contributions welcome)
>> > >>
>> > >> Why Exit Experimental now?
>> > >>
>> > >> Typically when we call an SDK or API Experimental, it's because there's 
>> > >> a risk that API or behaviors may change significantly.
>> > >> This in turn, leads to additional work for users of the SDK on every 
>> > >> release which leads to sticking to older versions or forking
>> > >> to preserve behavior. Version updates should be looked forward to, and 
>> > >> viewed as having little risk. Further while there's been
>> > >> previous dicussion about what the "low bar" is for a new SDK, it hasn't 
>> > >> been summarily applied to the Go SDK. I feel this has
>> > >> hurt development and contribution of new SDK languages (inherent 
>> > >> difficulty of SDK development notwithstanding).
>> > >>
>> > >> When the SDK was designed, it wasn't entirely clear what the Beam Model 
>> > >> should look like in an opinionated language like Go.
>> > >> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc 
>> > >> [0]) goes into detail what it means for a language without
>> > >> Generics, or overloading, or inheritance to implement the beam model. 
>> > >> One could largely throw away static types (like Python),
>> > >> but this approach rings hollow for Go. It would not do if the approach 
>> > >> couldn't grow and scale to the Beam Model. It's also hard
>> > >> to tell if an API is any good before there are users.
>> > >>
>> > >> Further, in the early days of Portability, there wasn't a way to write 
>> > >> scalable DoFns, dynamically or otherwise. It's an incredible
>> > >> bottleneck to need to do all initial fanout of work on a single 
>> > >> machine, write everything to a Reshuffle, just in order to scale up.
>> > >> Without being able to scale, Beam is little more than overhead.
>> > >>
>> > >> At this point, both of these needs are met within the Go SDK for open 
>> > >> source.
>> > >>
>> > >> Background
>> > >>
>> > >> The Go SDK has been a part of the beam repo for a few years now, since 
>> > >> it was accidentally merged into master.
>> > >> Since then it's been called experimental, and not officially part of 
>> > >> the releases.
>> > >>
>> > >> Of the SDKs, it's was always designed around Beam Portability first. It 
>> > >> never had any "Legacy" (SDK x Runner specific ) workers.
>> > >> It's always used the Beam Pipeline protos and FnAPI to execute jobs, 
>> > >> first with some very experimental code on Dataflow, but now
>> > >> on all portable supported runners, like Flink, Spark, the Python 
>> > >> Portable runner, and Dataflow.
>> > >>
>> > >> API Stability
>> > >>
>> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn and 
>> > >> pipeline construction since it was first merged in, and there are no
>> > >> changes to that on the horizon that can't be made in a backwards 
>> > >> compatible manner. Largely these are related to New Features, or
>> > >> usability improvements enabled by the advent of Go Generics (think of 
>> > >> "real" KV, emitter, and iterator types).
>> > >>
>> > >> It's an open secret that the Go SDK has largely been under work for use 
>> > >> within Google. It's use is called FlumeGo, representing
>> > >> the Apache Beam Go SDK, running on top of Flume, Google's batch 
>> > >> pipeline processing engine. Thus most of the focus on improving
>> > >> batch execution. FlumeGo sees ample use today, and there hasn't been a 
>> > >> call for fundamental changes to the API for ergonomic or
>> > >> usability concerns.
>> > >>
>> > >> Scalability
>> > >>
>> > >> Google could get away without the Go SDK having an SDK side scalability 
>> > >> solution as a result of it's integration with Flume.
>> > >> However, those days are now past.
>> > >>
>> > >> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, 
>> > >> which supports writing scalable batch transforms natively
>> > >> in the Go SDK.
>> > >> The SDK also supports Cross Language Transforms, with Beam Schema 
>> > >> encodings. With it, production hardened transforms
>> > >> from Java and Python are a wrapper away.
>> > >>
>> > >> Presently, Daniel Oliveira (who implemented the SDF side work, and 
>> > >> completed the Xlang work,) is adding a wrapper for the
>> > >> Java Kafka IO using Cross Language Transforms, which is often been 
>> > >> requested. This will also enable use of the Beam SQL
>> > >> transforms that java enables.
>> > >>
>> > >> Features
>> > >>
>> > >> The Go SDK implements the Beam C=core. The Go SDK implements standard 
>> > >> coders, allows for user DoFns, and CombineFns and access
>> > >> to core transforms like Flatten, GroupByKey, and features like Side 
>> > >> Inputs, Windowing, and User Metrics.
>> > >> Basic windowing will be fully supported for batch even through lifted 
>> > >> combines in the 2.32.0 release.
>> > >>
>> > >> All of the above enables Beam Go to be versatile for batch execution on 
>> > >> portable runners, and for simple streaming pipelines.
>> > >>
>> > >> Repo Testing
>> > >>
>> > >> On precommit the Go SDK runs all it's unit tests. On top of that, it 
>> > >> runs all it's integration tests against the Python Portable runner,
>> > >> making it quick and robust to detect breaking changes without 
>> > >> overspending community resources. Those same tests are also
>> > >> run against Dataflow, Flink, and Spark.
>> > >>
>> > >> The tests are executable against all runners via the appropriate Go 
>> > >> commands (if you've stood up your own job management server),
>> > >> or Gradle commands (which will spin up runner instances for you). 
>> > >> Documentation for executing tests and adding new ones
>> > >> is on the wiki. [2] They are accessible to Go developers as they're 
>> > >> implemented with the standard Go testing tools.
>> > >>
>> > >> Shortcomings
>> > >> That said, there's still much to do. Let me briefly tell you what 
>> > >> doesn't work, and it's up to you to weigh whether they block
>> > >> being out of experimental.
>> > >>
>> > >> At present, only a textio has been implemented as Splittable DoFn.
>> > >> Once the Kafka wrapper is merged in, it will serve as a the first 
>> > >> example for future contributions for
>> > >> new transform wrappers for the Go SDK.
>> > >> Transforms and IOs are lacking, but at this point users are empowered 
>> > >> to write their own DoFns or wrap existing transforms for Cross Language 
>> > >> use.
>> > >>
>> > >> In the core SDK, more streaming focused features have yet to be 
>> > >> implemented, but they're largely additions to what exists already
>> > >> rather than total rebuilds. Much of the work is definining how a user 
>> > >> specifies their desires, and turning those into the appropriate
>> > >> FnAPI requests at execution time. Back in October I wrote at length on 
>> > >> the wiki [1] what's missing for additional streaming features.
>> > >>
>> > >> While we have bolstered our testing recently, there's likely still more 
>> > >> we could test to improve our confidence in the SDK,
>> > >> in particular regarding the included transforms libraries and examples.
>> > >>
>> > >> Moving Forward
>> > >>
>> > >> My immediate plan is to work on incorporating the Go SDK fully into the 
>> > >> Beam Programming Guide. I've audited the guide [3], and
>> > >> am beginning to add missing content and filling in the Go specific 
>> > >> gaps. This will be tied to improving the Go Doc with more Go
>> > >> specific user documentation that isn't appropriate for the BPG.
>> > >> And resolving the LICENSE issue around the public display of that GoDoc.
>> > >>
>> > >> If this proposal is accepted by a binding vote, I will incorporate the 
>> > >> SDK into the release process, and remove the "experimental"
>> > >> language around the SDK. This largely entails updating the release 
>> > >> scripts to also build and publish the Go SDK Docker containers.
>> > >> As for releasing the code, we're technically already doing so whenever 
>> > >> we tag a release branch [4].
>> > >>
>> > >> The clearest signal to the Go community however will be migrating the 
>> > >> SDK to use Go Modules for dependency version control,
>> > >> which Daniel is planning on working on after his Kafka task. This will 
>> > >> put our repo infrastructure, SDK contributors, and users
>> > >> on the same footing when it comes to dependency management. It will 
>> > >> remove the "+incompatible" tags one sees on the
>> > >> pkg.go.dev list at [4].
>> > >>
>> > >> I'm very happy to answer any questions you might have about the SDK, 
>> > >> and provide additional links as needed. I intentionally avoided
>> > >> a link barrage in this email, as they can distract from the point: The 
>> > >> SDK is ready for folks to use it, we need to tell them that they can
>> > >> rather than they shouldn't.
>> > >>
>> > >> Robert Burke
>> > >> Defacto Beam Go TL
>> > >>
>> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
>> > >> [1] 
>> > >> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>> > >> [3] 
>> > >> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>> > >>  (SDK Audit sheet)
>> > >> [4] 
>> > >> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>> >

Reply via email to