[
https://issues.apache.org/jira/browse/BEAM-5379?focusedWorklogId=230458&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230458
]
ASF GitHub Bot logged work on BEAM-5379:
----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Apr/19 17:59
Start Date: 20/Apr/19 17:59
Worklog Time Spent: 10m
Work Description: lostluck commented on issue #8354: [BEAM-5379] Go
Modules versioning support
URL: https://github.com/apache/beam/pull/8354#issuecomment-485147347
I'm fluent in wall of text, and am happy to provide all the context I have
available. Lets get this done!
+1 to discussing here at the code. I agree entirely.
On Go Gradle:
* I am ambivalent about keeping the plugin, and modifying things to direct
go tool usage (likely shell scripts) so long as the tasks can continue to be
triggered via Gradle. It simplifies things for test infrastructure that way.
* There was a bit of [discussion on ditching gogradle recently, and I got
reasonable
pushback](https://lists.apache.org/thread.html/2754c89ef5f4ad850b98fa558ed245a5bdaa099c04bdcfbd0a34171f@%3Cdev.beam.apache.org%3E)
* If we can address those concerns (I think we can), we can probably do it
with the scripty + go tools approach.
* Go Modules (committing both a go.mod & go.sum) do accomplish the goal of
hermetic reproducible builds. There's a reasonable fear of external tooling by
those who work with gradle more regularly.
* Go Gradle likely sets up a GO_PATH which would disable modules anyway by
default without setting the right ENV variable setting, and then vendor the
packages listed in the lock file.
* You can find all the different Go "targets" by searching for
[applyGoNature] in this
repo.(https://github.com/apache/beam/search?q=applyGoNature&unscoped_q=applyGoNature)
On the v0.X:
* Sadly, as much as I like saying "it's in the documentation",
discoverability of those statements is hard. The version number (such as it
will display in a user's go.mod) is an unambiguous statement that's much harder
to miss.
* "There be Dragons" is largely due to the lack of mature native Go IO
connectors. As I outlined recently in a dev list post on the [Go SDK
status](https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E),
the connectors have received very little, if any love, and aren't currently
tested meaningfully.
* I can't in good conscious recommend users depend on those IOs at this
time. Even implicitly via the version numbering, since they won't do the job
Beam promises, which is allow you to write code that can scale.
* It is possible to write scalable batch sinks, with certain techniques
(resharding via CoGBKs, expanding to 1 DoFn per input file, etc), but I'm
biased to either Cross Language IO, or runner availability of Splittable Do Fns
(even just the initial splitting), so we can implement the Go support for that,
and directly have the final solution, rather than migrate through a few stages.
* There's also the matter of importing packages that are v2: They need to
have the v2 in the import path etc.
* The actual version number largely speaks to the passage of time, but it is
useful for runners and services to be able to say "The Go SDK versions X-Z all
work on the FooBar runner!", and they can also maintain their own testing to be
able to deprecate support for such versions later.
On Python and Java SDK versions being in sync:
* This is a nuance from the history of the project. Python and Java have
"pre-portability" implementations, where the SDK and the runner were tightly
coupled on the worker. That is, every SDK+Runner pair required it's own Worker.
As you can imagine, this doesn't scale. Everything (runners+ SDKs) *had* to be
at the same version.
* Portability (via the FnAPI) allows each SDK to have it's own worker side
harness, that communicates to a runner side harness. Then each runner , and
each SDK merely maintains it's own half, simplifying both SDK and runner
development.
* The Go SDK has only ever had a "portable" implementation.
On Independant Language Versions:
* I personally don't see it making sense long term for SDKs to be on the
same version. It's useful for users to know "the SDK didn't change, so my
pipeline code will be the same break"
* Why should the Go SDK get a patch release when Java as a bug to fix?
* If Generics ever happen for Go, and adopting them requires making API
breaking changes, why should Java and Python get a major version bump for the
sake of Go?
* That it's compatible with runners on Beam vFoo.Bar, and that can be tested
independantly.
On the version of Go being used:
* Currently, IIRC the Jenkin's machines are using Go1.10. Our GoGradle is
currently [configured to use
go1.10](https://github.com/apache/beam/blob/9c8a8dc241296ad5003124c8bcb94a50bb53beb5/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1310)
and I'm not sure if we updated the Jenkins machines to have Go 1.12 when they
recently broke.
* We *should* provide some kind of statement of what versions of Go are
supported, but this is another reason for being v0.X, we haven't made those
kinds of commitments yet.
* There's similar work being done for tracking supported Java and Python
versions, so this seems reasonable.
On having 2 modules, rather than one:
* It's useful to be able to "retarget" the tests independently of the
version of the Go SDK being used, eg to find regressions run performance
comparisons against different minor versions etc.
* Validating compatibility of Go SDK versions X against Runner Version Y
* Sets us up better for the non-zero eventuality that the repo splits per
SDK language one portability is complete. There would still be some work to do
of course WRT path changes (unless we adopt Vanity Import URLs), but the
modules would be independent.
* Clean split of dependencies for things used to vet the SDK against
different runners, vs what users actually need to use the SDK.
I might be repeating myself from the doc a bit, but re-writing is handy in
making thoughts more concrete.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 230458)
Time Spent: 1h 50m (was: 1h 40m)
> Go Modules versioning support
> -----------------------------
>
> Key: BEAM-5379
> URL: https://issues.apache.org/jira/browse/BEAM-5379
> Project: Beam
> Issue Type: Improvement
> Components: sdk-go
> Reporter: Robert Burke
> Assignee: Robert Burke
> Priority: Major
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> This would make it easier for non-Go developers to update and test changes to
> the Go SDK without jumping through hoops to set up Go Paths at first.
> Right now, we us the gogradle plugin for gradle to handle re-producible
> builds. Without doing something with the GO_PATH relative to a user's local
> git repo though, changes made in the user's repo are not represented when
> gradle is invoked to test everything.
> One of at least the following needs to be accomplished:
> * gogradle moves to support the Go Modules experiment in Go 1.11, and the SDK
> migrates to that
> * or we re-implement our gradle go rules ourselves to use them,
> * or some third option, that moves away from the GO_PATH nit.
> This issue should be resolved after deciding and implementing a clear
> versioning story for the SDK, ideally along Go best practices.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)