Thanks for the super-detailed explanation and illustration.

I especially think you emphasized something we generally have not
considered: the end-user pinning a different version of a client lib, and
the fact that if we vendor the client lib they cannot change the version.
To me, that makes vendoring client libs kind of a deal-breaker. So I think:

a) yes, solved by vendored (already done)
b) leaving the IO's version of Guava on the classpath seems the only
solution, and end-user has to resolve
c) good point that in the not-on-API-surface case also, we can't really
tweak anything and we need the end-user to resolve

So the main action item is to remove the top-level Guava version and to let
IOs determine their Guava version, with resolution of diamond deps when
needed in test configurations.

Kenn

On Wed, Mar 6, 2019 at 3:18 AM Michael Luckey <adude3...@gmail.com> wrote:

> Unfortunately, I am unable to see, how option 2 could be done from beam
> side? Or is it meant to delay to the end user?
>
> Let me try to elaborate:
>
> Assume user implements a pipeline with 2 different IOs, both depending on
> incompatible versions of e.g. guava. Also, user code has a (direct or
> transitive) dependency to
> yet another incompatible version of guava. I ll try to get into more
> details later, but for now assume
>
>      AIO   ----     ParDo    ----    ParDo   ----     Pardo   ----    BIO
>        |                                          |
>                     |
>     guava-X.jar                       guava-Y.jar
>  guava-Z.jar
>
> What currently happens (AFAIU) is during build we do pin the versions of
> guava to some fixed version [1], which means AIO and BIO are compiled with
> guava-20, which might or might not
> be ABI compatible to guava-X and/or guava-Z. (This is the problem these 2
> users encountered, right?)
>
> Now the user trying to run this pipeline has to decide, which version to
> use. Whatever she chooses the might be incompatibilities, She could do so
> with maven, and of course also with Gradle or any other build system
> (hopefully).
>
> If we now replace guava.20 with the newest on build, lets say guava-LATEST
> which happens to be a dependency of CIO, how would that change the game. So
> is there anything beam could do about it?
>
> Now I try to understand those deps in more detail. We have those IOs, and
> the user code thingy. I assume, beam code switched to vendors guava, so
> nothing is exposed here, we only have issues on external interfaces as IOs.
>
> 1. The user code.
> Whether guava is used directly or transitive by some used library
> dependency, there is nothing beam can do about it. The user has to solve
> this issue by herself. She will probably encounter exactly the same
> problems as beam within the IOs,
>
> 2. IOs
>
> a: Beam IO code uses guava, no transitive deps. This is solved by
> vendoring, right?
>
>    SomeIO <--- guava-20.jar           is replaced by
>  SomeIO <--- vendored_guava_v20.ja
>
> b: Beam uses some client lib (e.g. some-client-core) which has a
> dependency on guava but does not expose any guava class to the outside
>
>    SomeIO <---- some-client-lib <---- guava-X.jar
>
>  In this case beam 'could' relocate the client lib and 'solve' that guava
> exposing on the class path, *but* we might be tight to a specific version
> of the backend. as the user can not replace the 'some-client-core'
> dependency with a different version (hoping there are no interop issues
> with beams IO)
> I guess, we are better off, not doing that? Which unfortuantely means
> delaying version resolution to the end user, as transitive dependency will
> end up on the end user runtime classpath and might cause diamond
> dependency issues.
>
> c: Beam uses some client lib (e.g. bigquery-client-core) which has a
> dependency on guava which does expose guava classes in its external API
>
>    SomeIO <---- some-client-lib <---- guava-X.jar
>       |                                                      |
>       -----------------------------------------------
>
> We could of course also try to shade here with above drawbacks, but this
> might not even work as the client lib communicates with the backend and
> might also expose guava there. So the best we could do is, as Kenn stated,
> to ensure interoperability SomeIO with the client.
>
> So to my current understanding I second Kenn here. We should stop forcing
> a specific version of (of course uncensored) guava and align on IO basis to
> the requirements of dependencies. Still, on that lower (IO) level we might
> need to force some version as already there we might have diamond deps on
> guava. But I think we are better of not to force a beam global version here.
>
> Of course, that will force the user to handle the problem, but she needs
> to do it anyway, and, unfortunately me also fails to see anything we could
> do better here. (At least current forcing to v20 seems to worsen the
> problem).
>
> [1]
> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L351
>
> On Tue, Mar 5, 2019 at 11:23 PM Gleb Kanterov <g...@spotify.com> wrote:
>
>> I agree with the points that Kenneth has raised, mainly:
>>
>> > In both of the above approaches, diamond dependency problems between
>> IOs are quite possible.
>>
>> Option 1. With having more IO-s in Beam we would start hitting diamond
>> dependency problem more often. Relocating dependencies will help, but for
>> this, we should avoid exposing relocated classes to end-users of IO-s. I
>> can't speak about everything, but in the case of BigtableIO, it only
>> exposes proto-classes that aren't part of bigtable-client-core, and
>> shouldn't be relocated.
>>
>> Option 2. Without relocation, every other IO can be potentially broken,
>> and we can solve this problem on a case-by-case basis. In maven
>> world situation becomes a little better with requireUpperBoundDeps [1] from
>> maven-enforcer-plugin. I don't know if there is a similar solution for
>> gradle.
>>
>> Option 3. There is a potential future solution for dependency conflicts
>> between IO-s with Java 9 JPMS [2], however, it could take a while before we
>> could use it due to compatibility issues.
>>
>> As a short term solution, option 2 seems the best, we could go through
>> known conflicts and see if it's possible to resolve them, potentially
>> looking into option 1 that would take much more time.
>>
>> [1]:
>> https://maven.apache.org/enforcer/enforcer-rules/requireUpperBoundDeps.html
>> [2]: https://en.wikipedia.org/wiki/Java_Platform_Module_System
>>
>>
>> On Mon, Mar 4, 2019 at 4:45 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>>
>>> That looks interesting but I am not sure if I understand correctly,
>>> isn't the problem that the system API (Bigtable, Cassandra, etc)
>>> exposes guava related stuff? Or in other words, wouldn't the
>>> transitivie version of guava leak anyway?
>>> If it does not I am pretty interested on doing this to fix the
>>> Cassandra IO from leaking too.
>>> https://issues.apache.org/jira/browse/BEAM-5723
>>>
>>> On Thu, Feb 28, 2019 at 5:17 PM Kenneth Knowles <k...@apache.org> wrote:
>>> >
>>> > If someone is using BigTableIO with bigtable-client-core then having
>>> BigTableIO and bigtable-client-core both depend on Guava 26.0 is fine,
>>> right? Specifically, a user of BigTableIO after
>>> https://github.com/apache/beam/pull/7957 will still have non-vendored
>>> Guava on the classpath due to the transitive deps of bigtable-client-core.
>>> >
>>> > In any case it seems very wrong for the Beam root project to manage
>>> the version of Guava in BigTableIO since the whole point is to be
>>> compatible with bigtable-client-core. Would it work to delete our pinned
>>> Guava version [1] and chase down all the places it breaks, moving Guava
>>> dependencies local to places where an IO or extension must use it for
>>> interop? Then you don't need adapters.
>>> >
>>> > In both of the above approaches, diamond dependency problems between
>>> IOs are quite possible.
>>> >
>>> > I don't know if we can do better. For example, producing a
>>> bigtable-client-core where we have relocated Guava internally and using
>>> that could really be an interop nightmare as things that look like the same
>>> type would not be. Less likely to be broken would be bigtable-client-core
>>> entirely relocated and vendored, but generally IO connectors exchange
>>> objects with users and the users would have to use the relocated versions,
>>> so that's gross.
>>> >
>>> > Kenn
>>> >
>>> > [1]
>>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L353
>>> >
>>> >
>>> > On Thu, Feb 28, 2019 at 2:29 AM Gleb Kanterov <g...@spotify.com>
>>> wrote:
>>> >>
>>> >> For the past week, two independent people have asked me if I can help
>>> with guava NoSuchMethodError in BigtableIO. It turns out we still have a
>>> potential problem with dependencies that don't vendor guava, in this case,
>>> it was bigtable-client-core that depends on guava-26.0. However, the root
>>> cause of the classpath problem was in the usage of a deprecated method from
>>> non-vendored guava in BigtableServiceClientImpl in the code path where we
>>> integrate with bigtable client.
>>> >>
>>> >> I created apache/beam#7957 [1] to fix that. There few other IO-s
>>> where we use non-vendored guava that we can fix using adapters.
>>> >>
>>> >> And there is an unknown number of conflicts between guava versions in
>>> our dependencies that don't vendor it, that as I understand it, could be
>>> fixed by relocating them, in a similar way we do for Calcite [2].
>>> >>
>>> >> [1]: https://github.com/apache/beam/pull/7957
>>> >> [2]:
>>> https://github.com/apache/beam/blob/61de62ecbe8658de866280a8976030a0cb877041/sdks/java/extensions/sql/build.gradle#L30-L39
>>> >>
>>> >> Gleb
>>> >>
>>> >> On Sun, Jan 20, 2019 at 11:43 AM Gleb Kanterov <g...@spotify.com>
>>> wrote:
>>> >>>
>>> >>> I didn't look deep into it, but it seems we can put
>>> .idea/codeInsightSettings.xml into our repository where we blacklist
>>> packages from auto-import. See an example in
>>> JetBrains/kotlin/.idea/codeInsightSettings.xml.
>>> >>>
>>> >>> On Sat, Jan 19, 2019 at 8:03 PM Reuven Lax <re...@google.com> wrote:
>>> >>>>
>>> >>>> Bad IDEs automatically generate the wrong import. I think we need
>>> to automatically prevent this, otherwise the bad imports will inevitably
>>> slip back in.
>>> >>>>
>>> >>>> Reuven
>>> >>>>
>>> >>>> On Tue, Jan 15, 2019 at 2:54 AM Łukasz Gajowy <
>>> lukasz.gaj...@gmail.com> wrote:
>>> >>>>>
>>> >>>>> Great news. Thanks all for this work!
>>> >>>>>
>>> >>>>> +1 to enforcing this on dependency level as Kenn suggested.
>>> >>>>>
>>> >>>>> Łukasz
>>> >>>>>
>>> >>>>> wt., 15 sty 2019 o 01:18 Kenneth Knowles <k...@apache.org>
>>> napisał(a):
>>> >>>>>>
>>> >>>>>> We can enforce at the dependency level, since it is a compile
>>> error. I think some IDEs and build tools may allow the compile-time
>>> classpath to get polluted by transitive runtime deps, so protecting against
>>> bad imports is also a good idea.
>>> >>>>>>
>>> >>>>>> Kenn
>>> >>>>>>
>>> >>>>>> On Mon, Jan 14, 2019 at 8:42 AM Ismaël Mejía <ieme...@gmail.com>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Not yet, we need to add that too, there are still some tasks to
>>> be
>>> >>>>>>> done like improve the contribution guide with this info, and
>>> document
>>> >>>>>>> how to  generate a src build artifact locally since I doubt we
>>> can
>>> >>>>>>> publish that into Apache for copyright reasons.
>>> >>>>>>> I will message in the future for awareness for awareness when
>>> most of
>>> >>>>>>> the pending tasks are finished.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Mon, Jan 14, 2019 at 3:51 PM Maximilian Michels <
>>> m...@apache.org> wrote:
>>> >>>>>>> >
>>> >>>>>>> > Thanks for the heads up, Ismaël! Great to see the vendored
>>> Guava version is used
>>> >>>>>>> > everywhere now.
>>> >>>>>>> >
>>> >>>>>>> > Do we already have a Checkstyle rule that prevents people from
>>> using the
>>> >>>>>>> > unvendored Guava? If not, such a rule could be useful because
>>> it's almost
>>> >>>>>>> > inevitable that the unvedored Guava will slip back in.
>>> >>>>>>> >
>>> >>>>>>> > Cheers,
>>> >>>>>>> > Max
>>> >>>>>>> >
>>> >>>>>>> > On 14.01.19 05:55, Ismaël Mejía wrote:
>>> >>>>>>> > > We merged today the PR [1] that changes most of the code to
>>> use our
>>> >>>>>>> > > new guava vendored dependency. In practice it means that
>>> most of the
>>> >>>>>>> > > imports of the classes were changed from
>>> `com.google.common.` to
>>> >>>>>>> > > `org.apache.beam.vendor.guava.v20_0.com.google.common.`
>>> >>>>>>> > >
>>> >>>>>>> > > This is a great improvement to fix a long existing problem
>>> of guava
>>> >>>>>>> > > leaking through some Beam modules. This also reduces the
>>> size of most
>>> >>>>>>> > > jars in the project because they don't need to relocate and
>>> include
>>> >>>>>>> > > guava anymore, they just use the vendored dependency.
>>> >>>>>>> > >
>>> >>>>>>> > > Kudos to Kenn Knowles, Lukasz Cwik, Scott Wegner and the
>>> others that
>>> >>>>>>> > > worked (are working) to make this possible.
>>> >>>>>>> > >
>>> >>>>>>> > > Sadly as a side effect of the merge of this PR multiple PRs
>>> were
>>> >>>>>>> > > broken so please review if yours was and do a rebase and fix
>>> the
>>> >>>>>>> > > imports to use the new vendored dependency. Sorry for the
>>> >>>>>>> > > inconvenience. From now one all uses of guava should use the
>>> vendored
>>> >>>>>>> > > version. Expect some updates in the docs.
>>> >>>>>>> > >
>>> >>>>>>> > > [1]  https://github.com/apache/beam/pull/6809
>>> >>>>>>> > >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Cheers,
>>> >>> Gleb
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Cheers,
>>> >> Gleb
>>>
>>
>>
>> --
>> Cheers,
>> Gleb
>>
>

Reply via email to