It's my great pleasure to announce that the Apache Beam Go SDK is no longer experimental. https://beam.apache.org/blog/go-sdk-release/
Thank you everyone. Robert Burke Beam Go Busybody On Thu, Nov 4, 2021, 6:29 PM Robert Burke <[email protected]> wrote: > At this point I just need an LGTM on the blog post PR, as the draft is > finalized. > > Udi added the sdks/v2.33.0 tag which works as expected. I've also verified > that the appropriate container is used by default when not specified which > is the last unknown in this process. > > Who's ready to release a new SDK? I am! > > https://github.com/apache/beam/pull/15894 (or join the exciting reaction > emoji on the top post). > > > > On Wed, Nov 3, 2021, 8:37 PM Robert Burke <[email protected]> wrote: > >> The current draft of the exit blog post is >> https://github.com/apache/beam/pull/15894 >> Comments are very welcome. I'm going to continue looking for Known issues >> (which will be linked to their respective JIRAs) tomorrow. >> >> Since RC1 is getting cycled, I can also go back to the original plan of >> v2.33.0, if we'd like to get it out this week. >> >> >> On Wed, 3 Nov 2021 at 10:17, Robert Burke <[email protected]> wrote: >> >>> Investigation yielded that there's no way around the prefixed tags. The >>> JIRA has been commented with the explanation. >>> >>> https://github.com/apache/beam/pull/15881 has the release script >>> updates. >>> >>> I'm working on the Exit blogpost and the updated Go SDK roadmap. The >>> draft PR will be linked here. >>> >>> Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm >>> inclined to wait for that release to finish before publishing the blogpost. >>> I'll link the draft PR here as soon as it's ready. >>> >>> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also >>> prefix tagged so there isn't a gap in versions between the unmoduled code >>> and moduled code. >>> >>> Once published, that'll be the end of this thread. >>> >>> Thank you very much everyone. >>> >>> Robert Burke >>> Beam Go Busybody >>> >>> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <[email protected]> wrote: >>> >>>> +1 to extra tags. They'll be trivial to add to our release process, and >>>> git tags are lightweight by design so I don't foresee any problems. >>>> >>>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <[email protected]> >>>> wrote: >>>> >>>>> Glad you were able to figure it out. The extra tags are certainly >>>>> worth making this work if it's what we have to do, and shouldn't be >>>>> too much of a problem (until, hopefully, it's fixed on the go side). >>>>> >>>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <[email protected]> >>>>> wrote: >>>>> > >>>>> > With Kyle's help with the additional tagging of the next RC, we have >>>>> validated that this is the currently correct approach. >>>>> > >>>>> > >>>>> https://pkg.go.dev/github.com/apache/beam/sdks/[email protected]/go/pkg/beam?tab=versions >>>>> > >>>>> https://pkg.go.dev/github.com/apache/beam/sdks/[email protected]/go/pkg/beam >>>>> > >>>>> > Or even: >>>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam >>>>> (links to latest tagged version) >>>>> > >>>>> > The main cost to this approach is doubling the number of tags in the >>>>> tags list: https://github.com/apache/beam/tags which is not ideal, >>>>> but overall a small cost. There's no need for "full publish" of these >>>>> additional tags, so we won't be doubling our "releases" (see >>>>> https://github.com/apache/beam/releases). >>>>> > >>>>> > I'll still be filing a bug against the Go commands since the >>>>> mandatory prefixing is unintuitive, and seems unnecessary. If it becomes >>>>> so, we can always delete the tags from the affected branches, and cease >>>>> the >>>>> behavior going forward. I'll search through the existing Go issues first >>>>> however to see if this has been previously discussed, and report my >>>>> findings here either way. >>>>> > >>>>> > This does require 2 small changes to release guide: The rc tagging >>>>> script, and the finally tagging: >>>>> > >>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh >>>>> > >>>>> > >>>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag >>>>> > >>>>> > I'll make this change later this week (or early next) assuming there >>>>> are no objections. >>>>> > >>>>> > Thank you all very much for your patience, >>>>> > Robert Burke >>>>> > Beam Go Busybody >>>>> > >>>>> > >>>>> > On 2021/10/26 23:01:00, Robert Burke <[email protected]> wrote: >>>>> > > With much research in reading the Go Modules documentation, I have >>>>> confirmed what the issue is. >>>>> > > >>>>> > > We added the go.mod file to sdks/ under the repo root because it's >>>>> a cleaner spot for the change, captures the Java and Python container boot >>>>> code (written in Go) into the module and avoids conflicts in >>>>> interpretations of the vendor directory that lives at the root level. >>>>> > > >>>>> > > However, we missed that when doing so, the standard version tags >>>>> would only apply to modules at the root level, not at modules in >>>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but >>>>> quoting the important paragraph: >>>>> > > >>>>> > > > If a module is defined in a subdirectory within the repository, >>>>> that is, the module subdirectory portion of >>>>> > > > the module path is not empty, then each tag name must be >>>>> prefixed with the module subdirectory, >>>>> > > > followed by a slash. For example, the module >>>>> golang.org/x/tools/gopls is defined in the gopls >>>>> > > > subdirectory of the repository with root path golang.org/x/tools. >>>>> The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 >>>>> in >>>>> that repository. >>>>> > > >>>>> > > Specifically, for the Go SDK to be able to be fetched at the right >>>>> version, we need to have prefixed tags like "sdks/v2.33.0" or >>>>> "sdks/v2.34.0-RC1" >>>>> > > >>>>> > > So, the fix for the Go versioning issue is to amend our Release >>>>> process (including generating Release Candidate builds) to also add a >>>>> prefixed version tag with the same version. >>>>> > > >>>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if there >>>>> are no objections we can back update the 2.33.0 release branch with such a >>>>> prefixed tag. At which point I can also write the Official Experiemental >>>>> Exit Blog post. >>>>> > > >>>>> > > Thank you all for your patience. >>>>> > > Robert Burke >>>>> > > >>>>> > > On 2021/10/14 00:00:53, Ahmet Altay <[email protected]> wrote: >>>>> > > > Thank you for the detailed update! Let us know if we can help. >>>>> > > > >>>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke < >>>>> [email protected]> wrote: >>>>> > > > >>>>> > > > > This is a status update. >>>>> > > > > >>>>> > > > > At this point 2.33.0 is released, but there are difficulties >>>>> with >>>>> > > > > accessing the tagged versions using the standard go tools. >>>>> It's currently >>>>> > > > > under investigation. >>>>> > > > > >>>>> > > > > Using the v2 path in a go program then running `go mod tidy` >>>>> will populate >>>>> > > > > the file with a pseudo-version rather than the latest tag >>>>> (v2.33.0) (eg >>>>> > > > > the line looks like >>>>> > > > > require github.com/apache/beam/sdks/v2 >>>>> v2.0.0-20211013181004-a9120e083008 >>>>> > > > > ) >>>>> > > > > >>>>> > > > > While this will work, it's not the desired experience for >>>>> users at this >>>>> > > > > point. Current downside is that the releases are not >>>>> meaningful targets for >>>>> > > > > some reason. However, we retain the other benefits of Go >>>>> Modules (actual >>>>> > > > > dependency versioning, management by go tools). >>>>> > > > > >>>>> > > > > The issue is some combination of the go tooling [A] , that we >>>>> added a go >>>>> > > > > mod file outside of the repo root [B], and that we did not >>>>> increment the >>>>> > > > > major version (v2 -> v3) when adding the go mod file [C]. >>>>> > > > > >>>>> > > > > [B] From the go documentation, this should be legal and fine, >>>>> even if it's >>>>> > > > > not recommended. This is fortunate because the root of the >>>>> repo would have >>>>> > > > > played poorly with root vendor directory, which the go tools >>>>> have opinions >>>>> > > > > on. >>>>> > > > > >>>>> > > > > [C] Incrementing the major version is recommended,in the Go >>>>> Modules >>>>> > > > > documentation, when transitioning to Go Modules. However, it >>>>> never said it >>>>> > > > > was required, nor did it indicate this current failure mode. >>>>> If anything >>>>> > > > > this should be documented in those docs, if it's not another >>>>> bug. We would >>>>> > > > > not necessarily want to declare a global v3 for beam at this >>>>> time, for just >>>>> > > > > the Go SDK, it would become confusing rather quickly. >>>>> Notionally there are >>>>> > > > > some larger breaking changes the Java and Python SDKs would >>>>> want to make in >>>>> > > > > such an event, and thus it's a larger conversation, that is >>>>> out of scope at >>>>> > > > > this time. >>>>> > > > > >>>>> > > > > This leaves [A] where some mis-understanding of the documented >>>>> semantics >>>>> > > > > occurred. I certainly expected the tagged version of the >>>>> non-root go-module >>>>> > > > > to be inherited from the parent, not wholesale ignored. As a >>>>> result, I'll >>>>> > > > > be filing a bug against the go tools to determine this, and >>>>> see what paths >>>>> > > > > forward exist. >>>>> > > > > >>>>> > > > > It's my hope to resolve this before we write a properly >>>>> Experimental Exit >>>>> > > > > blog post for the Go SDK. >>>>> > > > > >>>>> > > > > Thank you for your patience, and time. >>>>> > > > > Robert Burke >>>>> > > > > Beam Go Busybody >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > On 2021/08/23 18:12:00, Robert Burke <[email protected]> >>>>> wrote: >>>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK >>>>> now uses Go >>>>> > > > > Modules for dependency management, simplifying Go SDK >>>>> contributions. [2] >>>>> > > > > > >>>>> > > > > > The Module file lives in the sdks/ directory so there's a >>>>> single Go >>>>> > > > > Module for the whole SDK, tests, examples, and any support >>>>> code for the >>>>> > > > > container boot builds. This excludes the Go SDK Code katas [3] >>>>> go modules >>>>> > > > > which can be updated once 2.33.0 has been released. >>>>> > > > > > >>>>> > > > > > PR 15365 [4] adds the SDK containers back to the release >>>>> builds, and >>>>> > > > > default uses the release specific container for docker >>>>> execution jobs. For >>>>> > > > > at least the 2.33.0 release this does mean that manual >>>>> validation will >>>>> > > > > need to explictly specify RC versions of containers. However, >>>>> given that >>>>> > > > > the Go SDK container and worker boot process rarely changes, >>>>> this is >>>>> > > > > unlikely to be an issue. >>>>> > > > > > >>>>> > > > > > At present I'm cleaning up some of the references to >>>>> experimental, and >>>>> > > > > making it clear that 2.33.0 is the first non-experimental >>>>> release (even >>>>> > > > > though that's 4-6 weeks out from actual release.) CHANGES.md >>>>> will be >>>>> > > > > updated to note the event, but a larger blogpost will happen >>>>> after the >>>>> > > > > release goes public. >>>>> > > > > > >>>>> > > > > > Cheers, >>>>> > > > > > Robert Burke >>>>> > > > > > Defacto Beam Go TL. >>>>> > > > > > >>>>> > > > > > [1] >>>>> > > > > >>>>> https://pkg.go.dev/github.com/apache/[email protected]+incompatible/sdks/go/pkg/beam >>>>> > > > > > [2] https://github.com/apache/beam/pull/15323 >>>>> > > > > > [3] >>>>> https://github.com/apache/beam/tree/master/learning/katas/go >>>>> > > > > > [4] https://github.com/apache/beam/pull/15365 >>>>> > > > > > >>>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <[email protected]> >>>>> wrote: >>>>> > > > > > > +1, congratulations & thank you! >>>>> > > > > > > >>>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke < >>>>> [email protected]> >>>>> > > > > wrote: >>>>> > > > > > > >>>>> > > > > > > > Regarding documentation update: Initial PR is >>>>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes up >>>>> to section >>>>> > > > > ~4.3. >>>>> > > > > > > > JIRA link for Programing Guide changes: >>>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513 >>>>> > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <[email protected]> >>>>> wrote: >>>>> > > > > > > > > Yup! >>>>> > > > > > > > > >>>>> > > > > > > > > My immediate plan is to work on incorporating the Go >>>>> SDK fully >>>>> > > > > into the >>>>> > > > > > > > > Beam Programming Guide. I've audited the guide, and >>>>> > > > > > > > > am beginning to add missing content and filling in the >>>>> Go specific >>>>> > > > > gaps. >>>>> > > > > > > > > This will be tied to improving the Go Doc with more Go >>>>> > > > > > > > > specific user documentation that isn't appropriate for >>>>> the BPG. >>>>> > > > > > > > > >>>>> > > > > > > > > My audit of the guide is here: >>>>> > > > > > > > > >>>>> > > > > > > > >>>>> > > > > >>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 >>>>> > > > > > > > > >>>>> > > > > > > > > The other sheets focus on features and tests. The >>>>> feature page >>>>> > > > > looks >>>>> > > > > > > > worse >>>>> > > > > > > > > than it is, as it was more productive to focus on what >>>>> isn't >>>>> > > > > available >>>>> > > > > > > > than >>>>> > > > > > > > > what is. That's a snapshot of my actual working sheet >>>>> but I'll be >>>>> > > > > > > > updating >>>>> > > > > > > > > it as needed. >>>>> > > > > > > > > >>>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía < >>>>> [email protected]> >>>>> > > > > wrote: >>>>> > > > > > > > > >>>>> > > > > > > > > > Oups forgot to write one question. Will this come >>>>> with revamped >>>>> > > > > > > > > > website instructions/doc for golang too? >>>>> > > > > > > > > > >>>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía < >>>>> [email protected]> >>>>> > > > > > > > wrote: >>>>> > > > > > > > > > > >>>>> > > > > > > > > > > Huge +1 >>>>> > > > > > > > > > > >>>>> > > > > > > > > > > This is definitely something many people have >>>>> asked about, so >>>>> > > > > it is >>>>> > > > > > > > > > > great to see it finally happening. >>>>> > > > > > > > > > > >>>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles < >>>>> > > > > [email protected]> >>>>> > > > > > > > wrote: >>>>> > > > > > > > > > > > >>>>> > > > > > > > > > > > +1 awesome >>>>> > > > > > > > > > > > >>>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke < >>>>> > > > > [email protected] >>>>> > > > > > > > > >>>>> > > > > > > > > > wrote: >>>>> > > > > > > > > > > >> >>>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to >>>>> get those (Go >>>>> > > > > > > > modules >>>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and >>>>> certainly >>>>> > > > > before the >>>>> > > > > > > > 2.33 >>>>> > > > > > > > > > cut if release images aren't added to the 2.32 >>>>> process. >>>>> > > > > > > > > > > >> >>>>> > > > > > > > > > > >> Regarding Go Generics: at some point in the >>>>> future, we may >>>>> > > > > want a >>>>> > > > > > > > > > harder break between a newer Generic first API and >>>>> and the >>>>> > > > > current >>>>> > > > > > > > version, >>>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go >>>>> aren't >>>>> > > > > identical to >>>>> > > > > > > > the >>>>> > > > > > > > > > feature referred to by that term in Java, C++, Rust, >>>>> etc, so >>>>> > > > > it'll >>>>> > > > > > > > take a >>>>> > > > > > > > > > bit of time for that expertise to develop. >>>>> > > > > > > > > > > >> >>>>> > > > > > > > > > > >> However, by the current nature of Go, we had to >>>>> have pretty >>>>> > > > > > > > > > sophisticated reflective analysis to handle DoFns >>>>> and map them >>>>> > > > > to their >>>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV, >>>>> emitter, and >>>>> > > > > Iterator >>>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK >>>>> internals to >>>>> > > > > use >>>>> > > > > > > > > > generics (like the implementation of Stats DoFns >>>>> like Min, Max, >>>>> > > > > etc) >>>>> > > > > > > > would >>>>> > > > > > > > > > also be able to be made transparently to most users, >>>>> and >>>>> > > > > certainly any >>>>> > > > > > > > of >>>>> > > > > > > > > > the framework for execution time handling (the >>>>> "worker's SDK >>>>> > > > > harness") >>>>> > > > > > > > > > would be able to be cleaned up if need be. Finally, >>>>> adding more >>>>> > > > > > > > > > sophisticated DoFn registration and code generation >>>>> would be >>>>> > > > > able to >>>>> > > > > > > > > > replace the optional code generator entirely, saving >>>>> some users >>>>> > > > > a `go >>>>> > > > > > > > > > generate` step, simplifying getting improved >>>>> execution >>>>> > > > > performance. >>>>> > > > > > > > > > > >> >>>>> > > > > > > > > > > >> Changing things like making a Type Parameterized >>>>> > > > > PCollection, >>>>> > > > > > > > would >>>>> > > > > > > > > > be far more involved, as would trying to use some >>>>> kind of Apply >>>>> > > > > > > > format. The >>>>> > > > > > > > > > lack of Method Overrides prevents the apply chaining >>>>> approach. >>>>> > > > > Or at >>>>> > > > > > > > least >>>>> > > > > > > > > > prevents it from working simply. >>>>> > > > > > > > > > > >> >>>>> > > > > > > > > > > >> Finally, Go Generics won't be available until >>>>> Go 1.18, >>>>> > > > > which isn't >>>>> > > > > > > > > > until next year. See >>>>> https://blog.golang.org/generics-proposal >>>>> > > > > for >>>>> > > > > > > > > > details. >>>>> > > > > > > > > > > >> >>>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does >>>>> include a >>>>> > > > > Register >>>>> > > > > > > > > > calling convention, leading to a modest performance >>>>> improvement >>>>> > > > > across >>>>> > > > > > > > the >>>>> > > > > > > > > > board. >>>>> > > > > > > > > > > >> >>>>> > > > > > > > > > > >> Cheers, >>>>> > > > > > > > > > > >> Robert Burke >>>>> > > > > > > > > > > >> >>>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw < >>>>> > > > > [email protected]> >>>>> > > > > > > > wrote: >>>>> > > > > > > > > > > >> > +1 to declaring Golang support out of >>>>> experimental once >>>>> > > > > the Go >>>>> > > > > > > > > > Modules >>>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK needs >>>>> to support >>>>> > > > > every >>>>> > > > > > > > > > feature >>>>> > > > > > > > > > > >> > to be accepted, especially now that we can do >>>>> > > > > cross-language >>>>> > > > > > > > > > > >> > transforms, and Go definitely supports enough >>>>> to be quite >>>>> > > > > > > > useful. >>>>> > > > > > > > > > (WRT >>>>> > > > > > > > > > > >> > streaming, my understanding is that Go >>>>> supports the >>>>> > > > > streaming >>>>> > > > > > > > model >>>>> > > > > > > > > > > >> > with windows and timestamps, and runs fine on >>>>> a streaming >>>>> > > > > > > > runner, >>>>> > > > > > > > > > even >>>>> > > > > > > > > > > >> > if more advanced features like state and >>>>> timers aren't yet >>>>> > > > > > > > > > available.) >>>>> > > > > > > > > > > >> > >>>>> > > > > > > > > > > >> > This is a great milestone. >>>>> > > > > > > > > > > >> > >>>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson >>>>> Hamilton < >>>>> > > > > > > > [email protected]> >>>>> > > > > > > > > > wrote: >>>>> > > > > > > > > > > >> > > >>>>> > > > > > > > > > > >> > > WOW! Big news. >>>>> > > > > > > > > > > >> > > >>>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental >>>>> status after Go >>>>> > > > > Modules >>>>> > > > > > > > > > are completed and the LICENSE issue is resolved. I >>>>> don't think >>>>> > > > > that >>>>> > > > > > > > lacking >>>>> > > > > > > > > > streaming support is a blocker. The other thing I >>>>> checked to see >>>>> > > > > was if >>>>> > > > > > > > > > there were metrics available on >>>>> metrics.beam.apache.org, >>>>> > > > > specifically >>>>> > > > > > > > for >>>>> > > > > > > > > > measuring code health via post-commit over time, >>>>> which there are >>>>> > > > > and >>>>> > > > > > > > the >>>>> > > > > > > > > > passing test rate is high (Huzzah!). The one thing >>>>> that >>>>> > > > > surprised me >>>>> > > > > > > > from >>>>> > > > > > > > > > your summary is that when Go introduces generics it >>>>> won't result >>>>> > > > > in any >>>>> > > > > > > > > > backwards incompatible changes in Apache Beam. >>>>> That's great >>>>> > > > > news, but >>>>> > > > > > > > does >>>>> > > > > > > > > > it mean there will be a need to support both >>>>> non-generic and >>>>> > > > > generic >>>>> > > > > > > > APIs >>>>> > > > > > > > > > moving forward? It seems like generics will be >>>>> introduced in the >>>>> > > > > Go >>>>> > > > > > > > 1.17 >>>>> > > > > > > > > > release (optimistically) in August this year. >>>>> > > > > > > > > > > >> > > >>>>> > > > > > > > > > > >> > > >>>>> > > > > > > > > > > >> > > >>>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert >>>>> Burke < >>>>> > > > > > > > [email protected]> >>>>> > > > > > > > > > wrote: >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Hello Beam Community! >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache Beam >>>>> Go SDK >>>>> > > > > > > > experimental. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> This thread is to discuss it as a >>>>> community, and any >>>>> > > > > > > > conditions >>>>> > > > > > > > > > that remain that would prevent the exit. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> tl;dr; >>>>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I >>>>> have both. >>>>> > > > > > > > > > > >> > >> This entails including it officially in >>>>> the Release >>>>> > > > > process, >>>>> > > > > > > > > > removing the various "experimental" text throughout >>>>> the repo etc, >>>>> > > > > > > > > > > >> > >> and otherwise treating it like Python and >>>>> Java. Some Go >>>>> > > > > > > > specific >>>>> > > > > > > > > > tasks around dep versioning. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model >>>>> efficiently for >>>>> > > > > most >>>>> > > > > > > > batch >>>>> > > > > > > > > > tasks, including basic windowing. >>>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are >>>>> tested on all >>>>> > > > > > > > Portable >>>>> > > > > > > > > > runners. >>>>> > > > > > > > > > > >> > >> The core APIs are not going to change in >>>>> incompatible >>>>> > > > > ways >>>>> > > > > > > > going >>>>> > > > > > > > > > forward. >>>>> > > > > > > > > > > >> > >> Scalable transforms can be written through >>>>> > > > > SplittableDoFns or >>>>> > > > > > > > > > via Cross Language transforms. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but >>>>> keeping it >>>>> > > > > > > > experimental >>>>> > > > > > > > > > doesn't help with that any further. >>>>> > > > > > > > > > > >> > >> Communities grow through contributions and >>>>> use, and >>>>> > > > > > > > experimental >>>>> > > > > > > > > > markers dissuade users. >>>>> > > > > > > > > > > >> > >> There's plenty to do in order expand what >>>>> can be done >>>>> > > > > with >>>>> > > > > > > > the >>>>> > > > > > > > > > SDK. (Contributions welcome) >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Why Exit Experimental now? >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API >>>>> Experimental, it's >>>>> > > > > > > > because >>>>> > > > > > > > > > there's a risk that API or behaviors may change >>>>> significantly. >>>>> > > > > > > > > > > >> > >> This in turn, leads to additional work for >>>>> users of >>>>> > > > > the SDK >>>>> > > > > > > > on >>>>> > > > > > > > > > every release which leads to sticking to older >>>>> versions or >>>>> > > > > forking >>>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates >>>>> should be looked >>>>> > > > > > > > forward >>>>> > > > > > > > > > to, and viewed as having little risk. Further while >>>>> there's been >>>>> > > > > > > > > > > >> > >> previous dicussion about what the "low >>>>> bar" is for a >>>>> > > > > new >>>>> > > > > > > > SDK, it >>>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I feel >>>>> this has >>>>> > > > > > > > > > > >> > >> hurt development and contribution of new >>>>> SDK languages >>>>> > > > > > > > (inherent >>>>> > > > > > > > > > difficulty of SDK development notwithstanding). >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't >>>>> entirely clear >>>>> > > > > what the >>>>> > > > > > > > > > Beam Model should look like in an opinionated >>>>> language like Go. >>>>> > > > > > > > > > > >> > >> Their initial take (see >>>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) >>>>> goes into >>>>> > > > > detail >>>>> > > > > > > > what it >>>>> > > > > > > > > > means for a language without >>>>> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance >>>>> to implement >>>>> > > > > the >>>>> > > > > > > > beam >>>>> > > > > > > > > > model. One could largely throw away static types >>>>> (like Python), >>>>> > > > > > > > > > > >> > >> but this approach rings hollow for Go. It >>>>> would not do >>>>> > > > > if the >>>>> > > > > > > > > > approach couldn't grow and scale to the Beam Model. >>>>> It's also >>>>> > > > > hard >>>>> > > > > > > > > > > >> > >> to tell if an API is any good before there >>>>> are users. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Further, in the early days of Portability, >>>>> there >>>>> > > > > wasn't a >>>>> > > > > > > > way to >>>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise. It's >>>>> an >>>>> > > > > incredible >>>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial >>>>> fanout of work on >>>>> > > > > a >>>>> > > > > > > > single >>>>> > > > > > > > > > machine, write everything to a Reshuffle, just in >>>>> order to scale >>>>> > > > > up. >>>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is >>>>> little more than >>>>> > > > > > > > overhead. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> At this point, both of these needs are met >>>>> within the >>>>> > > > > Go SDK >>>>> > > > > > > > for >>>>> > > > > > > > > > open source. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Background >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam >>>>> repo for a few >>>>> > > > > years >>>>> > > > > > > > now, >>>>> > > > > > > > > > since it was accidentally merged into master. >>>>> > > > > > > > > > > >> > >> Since then it's been called experimental, >>>>> and not >>>>> > > > > officially >>>>> > > > > > > > > > part of the releases. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed >>>>> around Beam >>>>> > > > > Portability >>>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner >>>>> specific ) >>>>> > > > > workers. >>>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline protos >>>>> and FnAPI to >>>>> > > > > > > > execute >>>>> > > > > > > > > > jobs, first with some very experimental code on >>>>> Dataflow, but now >>>>> > > > > > > > > > > >> > >> on all portable supported runners, like >>>>> Flink, Spark, >>>>> > > > > the >>>>> > > > > > > > Python >>>>> > > > > > > > > > Portable runner, and Dataflow. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> API Stability >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed >>>>> it's user API >>>>> > > > > for DoFn >>>>> > > > > > > > > > and pipeline construction since it was first merged >>>>> in, and >>>>> > > > > there are >>>>> > > > > > > > no >>>>> > > > > > > > > > > >> > >> changes to that on the horizon that can't >>>>> be made in a >>>>> > > > > > > > backwards >>>>> > > > > > > > > > compatible manner. Largely these are related to New >>>>> Features, or >>>>> > > > > > > > > > > >> > >> usability improvements enabled by the >>>>> advent of Go >>>>> > > > > Generics >>>>> > > > > > > > > > (think of "real" KV, emitter, and iterator types). >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has >>>>> largely been >>>>> > > > > under >>>>> > > > > > > > work >>>>> > > > > > > > > > for use within Google. It's use is called FlumeGo, >>>>> representing >>>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of >>>>> Flume, >>>>> > > > > Google's >>>>> > > > > > > > batch >>>>> > > > > > > > > > pipeline processing engine. Thus most of the focus >>>>> on improving >>>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use >>>>> today, and >>>>> > > > > there >>>>> > > > > > > > hasn't >>>>> > > > > > > > > > been a call for fundamental changes to the API for >>>>> ergonomic or >>>>> > > > > > > > > > > >> > >> usability concerns. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Scalability >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Google could get away without the Go SDK >>>>> having an SDK >>>>> > > > > side >>>>> > > > > > > > > > scalability solution as a result of it's integration >>>>> with Flume. >>>>> > > > > > > > > > > >> > >> However, those days are now past. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns >>>>> along with >>>>> > > > > Dynamic >>>>> > > > > > > > > > Splitting, which supports writing scalable batch >>>>> transforms >>>>> > > > > natively >>>>> > > > > > > > > > > >> > >> in the Go SDK. >>>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language >>>>> Transforms, with >>>>> > > > > Beam >>>>> > > > > > > > > > Schema encodings. With it, production hardened >>>>> transforms >>>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who >>>>> implemented the SDF >>>>> > > > > side >>>>> > > > > > > > work, >>>>> > > > > > > > > > and completed the Xlang work,) is adding a wrapper >>>>> for the >>>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language >>>>> Transforms, which >>>>> > > > > is often >>>>> > > > > > > > > > been requested. This will also enable use of the >>>>> Beam SQL >>>>> > > > > > > > > > > >> > >> transforms that java enables. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Features >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The >>>>> Go SDK >>>>> > > > > implements >>>>> > > > > > > > > > standard coders, allows for user DoFns, and >>>>> CombineFns and access >>>>> > > > > > > > > > > >> > >> to core transforms like Flatten, >>>>> GroupByKey, and >>>>> > > > > features >>>>> > > > > > > > like >>>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics. >>>>> > > > > > > > > > > >> > >> Basic windowing will be fully supported >>>>> for batch even >>>>> > > > > > > > through >>>>> > > > > > > > > > lifted combines in the 2.32.0 release. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be >>>>> versatile for >>>>> > > > > batch >>>>> > > > > > > > > > execution on portable runners, and for simple >>>>> streaming >>>>> > > > > pipelines. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Repo Testing >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit >>>>> tests. On >>>>> > > > > top of >>>>> > > > > > > > > > that, it runs all it's integration tests against the >>>>> Python >>>>> > > > > Portable >>>>> > > > > > > > runner, >>>>> > > > > > > > > > > >> > >> making it quick and robust to detect >>>>> breaking changes >>>>> > > > > without >>>>> > > > > > > > > > overspending community resources. Those same tests >>>>> are also >>>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> The tests are executable against all >>>>> runners via the >>>>> > > > > > > > appropriate >>>>> > > > > > > > > > Go commands (if you've stood up your own job >>>>> management server), >>>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up >>>>> runner >>>>> > > > > instances for >>>>> > > > > > > > > > you). Documentation for executing tests and adding >>>>> new ones >>>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible to >>>>> Go >>>>> > > > > developers as >>>>> > > > > > > > > > they're implemented with the standard Go testing >>>>> tools. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Shortcomings >>>>> > > > > > > > > > > >> > >> That said, there's still much to do. Let >>>>> me briefly >>>>> > > > > tell you >>>>> > > > > > > > > > what doesn't work, and it's up to you to weigh >>>>> whether they block >>>>> > > > > > > > > > > >> > >> being out of experimental. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> At present, only a textio has been >>>>> implemented as >>>>> > > > > Splittable >>>>> > > > > > > > > > DoFn. >>>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it >>>>> will serve as >>>>> > > > > a the >>>>> > > > > > > > > > first example for future contributions for >>>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK. >>>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at >>>>> this point >>>>> > > > > users are >>>>> > > > > > > > > > empowered to write their own DoFns or wrap existing >>>>> transforms >>>>> > > > > for >>>>> > > > > > > > Cross >>>>> > > > > > > > > > Language use. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused >>>>> features have >>>>> > > > > yet to >>>>> > > > > > > > be >>>>> > > > > > > > > > implemented, but they're largely additions to what >>>>> exists already >>>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the >>>>> work is >>>>> > > > > definining >>>>> > > > > > > > how a >>>>> > > > > > > > > > user specifies their desires, and turning those into >>>>> the >>>>> > > > > appropriate >>>>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back in >>>>> October I >>>>> > > > > wrote at >>>>> > > > > > > > > > length on the wiki [1] what's missing for additional >>>>> streaming >>>>> > > > > > > > features. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> While we have bolstered our testing >>>>> recently, there's >>>>> > > > > likely >>>>> > > > > > > > > > still more we could test to improve our confidence >>>>> in the SDK, >>>>> > > > > > > > > > > >> > >> in particular regarding the included >>>>> transforms >>>>> > > > > libraries and >>>>> > > > > > > > > > examples. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Moving Forward >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> My immediate plan is to work on >>>>> incorporating the Go >>>>> > > > > SDK >>>>> > > > > > > > fully >>>>> > > > > > > > > > into the Beam Programming Guide. I've audited the >>>>> guide [3], and >>>>> > > > > > > > > > > >> > >> am beginning to add missing content and >>>>> filling in the >>>>> > > > > Go >>>>> > > > > > > > > > specific gaps. This will be tied to improving the Go >>>>> Doc with >>>>> > > > > more Go >>>>> > > > > > > > > > > >> > >> specific user documentation that isn't >>>>> appropriate for >>>>> > > > > the >>>>> > > > > > > > BPG. >>>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around the >>>>> public >>>>> > > > > display of >>>>> > > > > > > > > > that GoDoc. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> If this proposal is accepted by a binding >>>>> vote, I will >>>>> > > > > > > > > > incorporate the SDK into the release process, and >>>>> remove the >>>>> > > > > > > > "experimental" >>>>> > > > > > > > > > > >> > >> language around the SDK. This largely >>>>> entails updating >>>>> > > > > the >>>>> > > > > > > > > > release scripts to also build and publish the Go SDK >>>>> Docker >>>>> > > > > containers. >>>>> > > > > > > > > > > >> > >> As for releasing the code, we're >>>>> technically already >>>>> > > > > doing so >>>>> > > > > > > > > > whenever we tag a release branch [4]. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> The clearest signal to the Go community >>>>> however will be >>>>> > > > > > > > > > migrating the SDK to use Go Modules for dependency >>>>> version >>>>> > > > > control, >>>>> > > > > > > > > > > >> > >> which Daniel is planning on working on >>>>> after his Kafka >>>>> > > > > task. >>>>> > > > > > > > > > This will put our repo infrastructure, SDK >>>>> contributors, and >>>>> > > > > users >>>>> > > > > > > > > > > >> > >> on the same footing when it comes to >>>>> dependency >>>>> > > > > management. >>>>> > > > > > > > It >>>>> > > > > > > > > > will remove the "+incompatible" tags one sees on the >>>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4]. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> I'm very happy to answer any questions you >>>>> might have >>>>> > > > > about >>>>> > > > > > > > the >>>>> > > > > > > > > > SDK, and provide additional links as needed. I >>>>> intentionally >>>>> > > > > avoided >>>>> > > > > > > > > > > >> > >> a link barrage in this email, as they can >>>>> distract >>>>> > > > > from the >>>>> > > > > > > > > > point: The SDK is ready for folks to use it, we need >>>>> to tell >>>>> > > > > them that >>>>> > > > > > > > they >>>>> > > > > > > > > > can >>>>> > > > > > > > > > > >> > >> rather than they shouldn't. >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> Robert Burke >>>>> > > > > > > > > > > >> > >> Defacto Beam Go TL >>>>> > > > > > > > > > > >> > >> >>>>> > > > > > > > > > > >> > >> [0] >>>>> https://s.apache.org/beam-go-sdk-design-rfc >>>>> > > > > > > > > > > >> > >> [1] >>>>> > > > > > > > > > >>>>> > > > > > > > >>>>> > > > > >>>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK >>>>> > > > > > > > > > > >> > >> [2] >>>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips >>>>> > > > > > > > > > > >> > >> [3] >>>>> > > > > > > > > > >>>>> > > > > > > > >>>>> > > > > >>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 >>>>> > > > > > > > > > (SDK Audit sheet) >>>>> > > > > > > > > > > >> > >> [4] >>>>> > > > > > > > > > >>>>> > > > > > > > >>>>> > > > > >>>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions >>>>> > > > > > > > > > > >> > >>>>> > > > > > > > > > >>>>> > > > > > > > > >>>>> > > > > > > > >>>>> > > > > > > >>>>> > > > > > >>>>> > > > > >>>>> > > > >>>>> > > >>>>> >>>>
