[ 
https://issues.apache.org/jira/browse/BEAM-5379?focusedWorklogId=230458&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230458
 ]

ASF GitHub Bot logged work on BEAM-5379:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Apr/19 17:59
            Start Date: 20/Apr/19 17:59
    Worklog Time Spent: 10m 
      Work Description: lostluck commented on issue #8354: [BEAM-5379] Go 
Modules versioning support
URL: https://github.com/apache/beam/pull/8354#issuecomment-485147347
 
 
   I'm fluent in wall of text, and am happy to provide all the context I have 
available. Lets get this done!
   +1 to discussing here at the code. I agree entirely.
   
   On Go Gradle:
   * I am ambivalent about keeping the plugin, and modifying things to direct 
go tool usage (likely shell scripts) so long as the tasks can continue to be 
triggered via Gradle. It simplifies things for test infrastructure that way.
     * There was a bit of [discussion on ditching gogradle recently, and I got 
reasonable 
pushback](https://lists.apache.org/thread.html/2754c89ef5f4ad850b98fa558ed245a5bdaa099c04bdcfbd0a34171f@%3Cdev.beam.apache.org%3E)
     * If we can address those concerns (I think we can), we can probably do it 
with the scripty + go tools approach.
   * Go Modules (committing both a go.mod & go.sum) do accomplish the goal of 
hermetic reproducible builds. There's a reasonable fear of external tooling by 
those who work with gradle more regularly.
   * Go Gradle likely sets up a GO_PATH which would disable modules anyway by 
default without setting the right ENV variable setting, and then vendor the 
packages listed in the lock file.
   * You can find all the different Go "targets" by searching for 
[applyGoNature] in this 
repo.(https://github.com/apache/beam/search?q=applyGoNature&unscoped_q=applyGoNature)
 
   
   On the v0.X:
   * Sadly, as much as I like saying "it's in the documentation", 
discoverability of those statements is hard. The version number (such as it 
will display in a user's go.mod) is an unambiguous statement that's much harder 
to miss.
    * "There be Dragons" is largely due to the lack of mature native Go IO 
connectors. As I outlined recently in a dev list post on the [Go SDK 
status](https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E),
 the connectors have received very little, if any love, and aren't currently 
tested meaningfully.
     * I can't in good conscious recommend users depend on those IOs at this 
time. Even implicitly via the version numbering, since they won't do the job 
Beam promises, which is allow you to write code that can scale.
     * It is possible to write scalable batch sinks, with certain techniques 
(resharding via CoGBKs, expanding to 1 DoFn per input file, etc), but I'm 
biased to either Cross Language IO, or runner availability of Splittable Do Fns 
(even just the initial splitting), so we can implement the Go support for that, 
and directly have the final solution, rather than migrate through a few stages.
   * There's also the matter of importing packages that are v2: They need to 
have the v2 in the import path etc.
   * The actual version number largely speaks to the passage of time, but it is 
useful for runners and services to be able to say "The Go SDK versions X-Z all 
work on the FooBar runner!", and they can also maintain their own testing to be 
able to deprecate support for such versions later.
   
   On Python and Java SDK versions being in sync:
   * This is a nuance from the history of the project. Python and Java have 
"pre-portability" implementations, where the SDK and the runner were tightly 
coupled on the worker. That is, every SDK+Runner pair required it's own Worker. 
As you can imagine, this doesn't scale. Everything (runners+ SDKs) *had* to be 
at the same version. 
   * Portability (via the FnAPI) allows each SDK to have it's own worker side 
harness, that communicates to a runner side harness. Then each runner , and 
each SDK merely maintains it's own half, simplifying both SDK and runner 
development.
   * The Go SDK has only ever had a "portable" implementation.
   
   On Independant Language Versions:
   * I personally don't see it making sense long term for SDKs to be on the 
same version. It's useful for users to know "the SDK didn't change, so my 
pipeline code will be the same break" 
   * Why should the Go SDK get a patch release when Java as a bug to fix?
   * If Generics ever happen for Go, and adopting them requires making API 
breaking changes, why should Java and Python get a major version bump for the 
sake of Go?
   * That it's compatible with runners on Beam vFoo.Bar, and that can be tested 
independantly.
   
   On the version of Go being used:
   * Currently, IIRC the Jenkin's machines are using Go1.10. Our GoGradle is 
currently [configured to use 
go1.10](https://github.com/apache/beam/blob/9c8a8dc241296ad5003124c8bcb94a50bb53beb5/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1310)
 and I'm not sure if we updated the Jenkins machines to have Go 1.12 when they 
recently broke.
   * We *should* provide some kind of statement of what versions of Go are 
supported, but this is another reason for being v0.X, we haven't made those 
kinds of commitments yet.
   * There's similar work being done for tracking supported Java and Python 
versions, so this seems reasonable.
   
   On having 2 modules, rather than one:
   * It's useful to be able to "retarget" the tests independently of the 
version of the Go SDK being used, eg to find regressions run performance 
comparisons against different minor versions etc.
     * Validating compatibility of Go SDK versions X against Runner Version Y
   * Sets us up better for the non-zero eventuality that the repo splits per 
SDK language one portability is complete. There would still be some work to do 
of course WRT path changes (unless we adopt Vanity Import URLs), but the 
modules would be independent.
   * Clean split of dependencies for things used to vet the SDK against 
different runners, vs what users actually need to use the SDK.
   
   I might be repeating myself from the doc a bit, but re-writing is handy in 
making thoughts more concrete.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 230458)
    Time Spent: 1h 50m  (was: 1h 40m)

> Go Modules versioning support
> -----------------------------
>
>                 Key: BEAM-5379
>                 URL: https://issues.apache.org/jira/browse/BEAM-5379
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: Major
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This would make it easier for non-Go developers to update and test changes to 
> the Go SDK without jumping through hoops to set up Go Paths at first.
> Right now, we us the gogradle plugin for gradle to handle re-producible 
> builds. Without doing something with the GO_PATH relative to a user's local 
> git repo though, changes made in the user's repo are not represented when 
> gradle is invoked to test everything.
> One of at least the following needs to be accomplished:
> * gogradle moves to support the Go Modules experiment in Go 1.11, and the SDK 
> migrates to that
> * or we re-implement our gradle go rules ourselves to use them, 
> * or some third option, that moves away from the GO_PATH nit.
> This issue should be resolved after deciding and implementing a clear 
> versioning story for the SDK, ideally along Go best practices.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to