Oh dang. Thanks for mentioning that! Here's an open copy of the versioning
thoughts doc, though there shouldn't be any surprises from the points I
mentioned above.

https://docs.google.com/document/d/1ZjP30zNLWTu_WzkWbgY8F_ZXlA_OWAobAD9PuohJxPg/edit#heading=h.drpipq762xi7

On Wed, 17 Apr 2019 at 21:20, Nathan Fisher <[email protected]> wrote:

> Hi Robert,
>
> Great summary on the current state of play. FYI the referenced G doc
> doesn't appear to people outside the org as a default.
>
> Great to hear the Go SDK is still getting love. I last looked at in
> September-October of last year.
>
> Cheers,
> Nathan
>
> On Wed, 17 Apr 2019 at 20:27, Lukasz Cwik <[email protected]> wrote:
>
>> Thanks for the indepth summary.
>>
>> On Mon, Apr 15, 2019 at 4:19 PM Robert Burke <[email protected]> wrote:
>>
>>> Hi Thomas! I'm so glad you asked!
>>>
>>> The status of the Go SDK is complicated, so this email can't be brief.
>>> There's are several dimensions to consider: as a Go Open Source Project,
>>> User Libraries and Experience, and on Beam Features.
>>>
>>> I'm going to be updating the roadmap later this month when I have a
>>> spare moment.
>>>
>>> *tl;dr;*
>>> I would *love* help in improving the Go SDK, especially around
>>> interactions with Java/Python/Flink. Java and I do not have a good working
>>> relationship for operational purposes, and the last time I used Python, I
>>> had to re-image my machine. There's lots to do, but shouting out tasks to
>>> the void is rarely as productive as it is cathartic. If there's an offer to
>>> help, and a preference for/experience with  something to work on, I'm
>>> willing to find something useful to get started on for you.
>>>
>>> (Note: The following are simply my opinion as someone who works with the
>>> project weekly as a Go programmer, and should not be treated as demands or
>>> gospel. I just don't have anyone to talk about Go SDK issues with, and my
>>> previous discussions, have largely seemed to fall on uninterested ears.)
>>>
>>> *The SDK can be considered Alpha when all of the following are true:*
>>> * The SDK is tested by the Beam project on a ULR and on Flink as well as
>>> Dataflow.
>>> * The IOs have received some love to ensure they can scale (either
>>> through SDF or reshuffles), and be portable to different environments (eg.
>>> using the Go Cloud Development Kit (CDK) libraries).
>>>    * Cross-Language IO support would also be acceptable.
>>> * The SDK is using Go Modules for dependency management, marking it as
>>> version 0.Minor (where Minor should probably track the mainline Beam minor
>>> version for now).
>>>
>>> *We can move to calling it Beta when all of the following are true:*
>>> * The all implemented Beam features are meaningfully tested on the
>>> portable runners (eg. a proper "Validates Runner" suite exists in Go)
>>> * The SDK is properly documented on the Beam site, and in it's Go Docs.
>>>
>>> After this, I'll be more comfortable recommending it as something folks
>>> can use for production.
>>> That said, there are happy paths that are useable today in batch
>>> situations.
>>>
>>> *Intro*
>>> The Go SDK is a purely Beam Portable SDK. If it runs on a distributed
>>> system at all, it's being run portably. Currently it's regularly tested on
>>> Google Cloud Dataflow (though Dataflow doesn't officially support the SDK
>>> at this time), and on it's own single bundle Direct Runner (intended for
>>> unit testing purposes). In addition, it's being tested at scale within
>>> Google, on an internal runner, where it presently satisfies our performance
>>> benchmarks, and correctness tests.
>>>
>>> I've been working on cases to make the SDK suitable for data processing
>>> within Google. This unfortunately makes my contributions more towards
>>> general SDK usability, documentation, and performance, rather than "making
>>> it usable outside Google". Note this also precludes necessary work to
>>> resolve issues with running Go SDK pipelines on Google Cloud Dataflow. I
>>> believe that the SDK must become a good member of the Go ecosystem, the
>>> Beam ecosystem.
>>>
>>> Improved Go Docs, are on their way, and Daniel Oliviera has been helping
>>> me make the "getting started" experience better by improving pipeline
>>> construction time error messages.
>>>
>>> Finally many of the following issues have JIRAs already, some don't. It
>>> would take me time I don't have to audit and line everything up for this
>>> email, please look before you file JIRAs for things mentioned below, should
>>> the urge strike you.
>>>
>>>
>>> *As a Go Open Source Project*As an open source project written in Go,
>>> the SDK is lagging on adopting Go Modules for Dependency Management and
>>> Versioning.
>>>
>>> Using Go Modules which would ensure that what the Beam project
>>> infrastructure is testing what users are getting.  I'm very happy to
>>> elaborate on this, and have a bit I wrote about it two months ago on the
>>> topic[1]. But I loathe sending out plans for things that I don't have time
>>> to work on, so it's only coming to light now.
>>>
>>> The short points are:
>>> * Go is opinionated about versioning since Go 1.11, when Modules were
>>> introduced. They allow for reproducible builds with versioned deps,
>>> supported by the Go language tools.
>>> * Packages 1 & greater are beholden to not make breaking changes. We're
>>> not yet there with the SDK yet (certainly not a 2.11 product), so IMO the
>>> SDK should be considered v0.X
>>> * I don't think it's reasonable to move SDK languages in lockstep with
>>> the project. Eg. The Go language is considering adopting Generics, which
>>> may necessitate a Major Version Change to the SDK user surface as it's
>>> modified to support them. It's not reasonable to move all of beam to a new
>>> version due to a single language surface.
>>>    * This isn't an issue since it reads: the Go SDK version X, runs
>>> against portable beam runners at version Y.
>>>
>>> See a recent email discussion thread [2] for other factors relating to
>>> Gradle.
>>>
>>> *User Libraries (IOs, Transforms)*
>>> There's a lack of testing around the IOs and Transforms in the SDK. In
>>> some cases, not even unit tests. Very little time has been spent by anyone
>>> to bring these to production quality.
>>>
>>> *The best route to production IOs right now would be to work on Cross
>>> Language IO support with the Go SDK. I imagine it would be similar to what
>>> Python is doing.*
>>>
>>> The Bounded IOs that exist are largely "toys" not written for serious
>>> production use. For Bounded cases, this is largely due to the lack of SDF
>>> or using reshuffle judiciously, or leveraging other known patterns to
>>> scalably read data. You'll note they aren't meaningfully tested anywhere as
>>> well.
>>>
>>> For Unbounded IOs, there's only 1 presently, and that's the Google Cloud
>>> PubSub IO. It's not portable. It can't be portable until we've implemented
>>> State+Timers, or SDFs. At present, it only works on Dataflow, and does so
>>> with runner substitution. As such, it uses the same pubsub connector that
>>> Streaming Dataflow jobs use. Interestingly, this means it can scale
>>> properly, and is technically the only one that can scale properly.
>>> Unfortunately, it only works on Dataflow.
>>>
>>> My work on using the Beam Go SDK inside Google uses a variant of Cross
>>> Language IO. This is one reason why I haven't spent any time on the IOs,
>>> because they aren't necessary inside Google, and there's not been a usecase
>>> I could contrive to spend the time to fix them up so far.
>>>
>>> *General SDK Code Quality*
>>> In my opinion the SDK is presently reasonable on general code quality.
>>> Most critical aspects have tests, and from Google internal testing on
>>> complex and large amounts of data, the SDK is performant, once a few bits
>>> of code generation is done to avoid reflection on the hot path.
>>>
>>> Various combinations of features should be vetted together better. Eg.
>>> Using composites wrapping various other beam primitives. This was an issue
>>> resolved recently for CoGBKs.
>>>
>>> *Beam Features*
>>> The SDK is largely usable for Batch Pipelines. I know this since that's
>>> what I'm ensuring is the case for a Google internal runner. I know the
>>> following "classes of feature" work for the batch use cases, to varying
>>> levels of documentation and testing.
>>> * DoFns
>>> * CombineFns
>>>   * Combiner Lifting
>>> * CoGroupByKey (Joins)
>>> * Side Inputs
>>> * User Defined Coders
>>> * Global Windows
>>> * User Metrics (though they need to move to the new beam Metrics protos)
>>>
>>> Streaming is another story. The following aren't implemented
>>> * State + Timers + Triggers
>>>   * Necessary for portable pubsub IOs for example.
>>> * SDFs aren't implemented yet
>>>    * Necessary for
>>> * Windows
>>>    * Session Windows
>>>    * Custom WindowFns
>>>
>>> I haven't run anything in streaming mode, so there are likely other
>>> features and considerations I'm missing.
>>>
>>> The following are implemented but not meaningfully tested
>>> * Windows
>>>   * Fixed Windowing
>>>    * Sliding Windows
>>>
>>> Other like Large Iterables Support , or Schema's are not yet implemented
>>> either. There are likely others, but I'd need to list everything form the
>>> compatibility matrix.
>>>
>>> *What I'm spending my time on*
>>> Documenting, and debugging google internal user issues. The following
>>> artifacts will be produced externally in the next few months:
>>> * Improved user documentation/programming on the Go SDK (targeted to
>>> folks who know Go, but not Beam, or any distributed programming).
>>> * An SDK contribution guide to be put on the Wiki, focusing on "Life of
>>> a Pipeline" from the user controller, to the worker perspective. and where
>>> each of those parts are being mapped to where the SDK is dealing with them.
>>> This should enable others to contribute beam features to the SDK.
>>> * The Versioning Issue mentioned above, it's finicky.
>>> * Large (State Backed) Iterable Support
>>>
>>> *What I'd love help with*
>>> 1. Getting the existing suite of SDK integration tests running against a
>>> ULR or Flink (there are Jira's for these).
>>> 2. Improving existing IOs, adding tests for existing features over
>>> adding new ones.
>>>    a) Migrate the existing IOs to use the Go CDK where possible (needs
>>> to wait for the Versioning/GoModules/Gradle issue to be resolved though).
>>>
>>> Your friendly neighbourhood Distributed Gopher Wrangler,
>>> Robert Burke (@lostluck)
>>>
>>> [1]
>>> https://docs.google.com/document/d/1nB5qCarN0jmo40zH1J0icZa6Wyb0v4u08AdG4WDTTEY/edit
>>>
>>> [2]
>>> https://lists.apache.org/thread.html/8952f546b449ce8682db221e7688db546e25145c31cd835ed88ad172@%3Cdev.beam.apache.org%3E
>>>
>>> On Sat, 13 Apr 2019 at 11:30, Thomas Weise <[email protected]> wrote:
>>>
>>>> How "experimental" is the Go SDK? What are the major work items to
>>>> reach MVP? How close are we to be able to run let's say wordcount on the
>>>> portable Flink runner?
>>>>
>>>> How current is the roadmap [1]? JIRA [2] could suggest that there is a
>>>> lot of work left to do?
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>> [1] https://beam.apache.org/roadmap/go-sdk/
>>>> [2]
>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20and%20component%20%3D%20sdk-go%20and%20resolution%20%3D%20Unresolved%20
>>>>
>>>>
>
> --
> Nathan Fisher
>  w: http://junctionbox.ca/
>

Reply via email to