Thanks for the super-detailed explanation and illustration. I especially think you emphasized something we generally have not considered: the end-user pinning a different version of a client lib, and the fact that if we vendor the client lib they cannot change the version. To me, that makes vendoring client libs kind of a deal-breaker. So I think:
a) yes, solved by vendored (already done) b) leaving the IO's version of Guava on the classpath seems the only solution, and end-user has to resolve c) good point that in the not-on-API-surface case also, we can't really tweak anything and we need the end-user to resolve So the main action item is to remove the top-level Guava version and to let IOs determine their Guava version, with resolution of diamond deps when needed in test configurations. Kenn On Wed, Mar 6, 2019 at 3:18 AM Michael Luckey <adude3...@gmail.com> wrote: > Unfortunately, I am unable to see, how option 2 could be done from beam > side? Or is it meant to delay to the end user? > > Let me try to elaborate: > > Assume user implements a pipeline with 2 different IOs, both depending on > incompatible versions of e.g. guava. Also, user code has a (direct or > transitive) dependency to > yet another incompatible version of guava. I ll try to get into more > details later, but for now assume > > AIO ---- ParDo ---- ParDo ---- Pardo ---- BIO > | | > | > guava-X.jar guava-Y.jar > guava-Z.jar > > What currently happens (AFAIU) is during build we do pin the versions of > guava to some fixed version [1], which means AIO and BIO are compiled with > guava-20, which might or might not > be ABI compatible to guava-X and/or guava-Z. (This is the problem these 2 > users encountered, right?) > > Now the user trying to run this pipeline has to decide, which version to > use. Whatever she chooses the might be incompatibilities, She could do so > with maven, and of course also with Gradle or any other build system > (hopefully). > > If we now replace guava.20 with the newest on build, lets say guava-LATEST > which happens to be a dependency of CIO, how would that change the game. So > is there anything beam could do about it? > > Now I try to understand those deps in more detail. We have those IOs, and > the user code thingy. I assume, beam code switched to vendors guava, so > nothing is exposed here, we only have issues on external interfaces as IOs. > > 1. The user code. > Whether guava is used directly or transitive by some used library > dependency, there is nothing beam can do about it. The user has to solve > this issue by herself. She will probably encounter exactly the same > problems as beam within the IOs, > > 2. IOs > > a: Beam IO code uses guava, no transitive deps. This is solved by > vendoring, right? > > SomeIO <--- guava-20.jar is replaced by > SomeIO <--- vendored_guava_v20.ja > > b: Beam uses some client lib (e.g. some-client-core) which has a > dependency on guava but does not expose any guava class to the outside > > SomeIO <---- some-client-lib <---- guava-X.jar > > In this case beam 'could' relocate the client lib and 'solve' that guava > exposing on the class path, *but* we might be tight to a specific version > of the backend. as the user can not replace the 'some-client-core' > dependency with a different version (hoping there are no interop issues > with beams IO) > I guess, we are better off, not doing that? Which unfortuantely means > delaying version resolution to the end user, as transitive dependency will > end up on the end user runtime classpath and might cause diamond > dependency issues. > > c: Beam uses some client lib (e.g. bigquery-client-core) which has a > dependency on guava which does expose guava classes in its external API > > SomeIO <---- some-client-lib <---- guava-X.jar > | | > ----------------------------------------------- > > We could of course also try to shade here with above drawbacks, but this > might not even work as the client lib communicates with the backend and > might also expose guava there. So the best we could do is, as Kenn stated, > to ensure interoperability SomeIO with the client. > > So to my current understanding I second Kenn here. We should stop forcing > a specific version of (of course uncensored) guava and align on IO basis to > the requirements of dependencies. Still, on that lower (IO) level we might > need to force some version as already there we might have diamond deps on > guava. But I think we are better of not to force a beam global version here. > > Of course, that will force the user to handle the problem, but she needs > to do it anyway, and, unfortunately me also fails to see anything we could > do better here. (At least current forcing to v20 seems to worsen the > problem). > > [1] > https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L351 > > On Tue, Mar 5, 2019 at 11:23 PM Gleb Kanterov <g...@spotify.com> wrote: > >> I agree with the points that Kenneth has raised, mainly: >> >> > In both of the above approaches, diamond dependency problems between >> IOs are quite possible. >> >> Option 1. With having more IO-s in Beam we would start hitting diamond >> dependency problem more often. Relocating dependencies will help, but for >> this, we should avoid exposing relocated classes to end-users of IO-s. I >> can't speak about everything, but in the case of BigtableIO, it only >> exposes proto-classes that aren't part of bigtable-client-core, and >> shouldn't be relocated. >> >> Option 2. Without relocation, every other IO can be potentially broken, >> and we can solve this problem on a case-by-case basis. In maven >> world situation becomes a little better with requireUpperBoundDeps [1] from >> maven-enforcer-plugin. I don't know if there is a similar solution for >> gradle. >> >> Option 3. There is a potential future solution for dependency conflicts >> between IO-s with Java 9 JPMS [2], however, it could take a while before we >> could use it due to compatibility issues. >> >> As a short term solution, option 2 seems the best, we could go through >> known conflicts and see if it's possible to resolve them, potentially >> looking into option 1 that would take much more time. >> >> [1]: >> https://maven.apache.org/enforcer/enforcer-rules/requireUpperBoundDeps.html >> [2]: https://en.wikipedia.org/wiki/Java_Platform_Module_System >> >> >> On Mon, Mar 4, 2019 at 4:45 PM Ismaël Mejía <ieme...@gmail.com> wrote: >> >>> That looks interesting but I am not sure if I understand correctly, >>> isn't the problem that the system API (Bigtable, Cassandra, etc) >>> exposes guava related stuff? Or in other words, wouldn't the >>> transitivie version of guava leak anyway? >>> If it does not I am pretty interested on doing this to fix the >>> Cassandra IO from leaking too. >>> https://issues.apache.org/jira/browse/BEAM-5723 >>> >>> On Thu, Feb 28, 2019 at 5:17 PM Kenneth Knowles <k...@apache.org> wrote: >>> > >>> > If someone is using BigTableIO with bigtable-client-core then having >>> BigTableIO and bigtable-client-core both depend on Guava 26.0 is fine, >>> right? Specifically, a user of BigTableIO after >>> https://github.com/apache/beam/pull/7957 will still have non-vendored >>> Guava on the classpath due to the transitive deps of bigtable-client-core. >>> > >>> > In any case it seems very wrong for the Beam root project to manage >>> the version of Guava in BigTableIO since the whole point is to be >>> compatible with bigtable-client-core. Would it work to delete our pinned >>> Guava version [1] and chase down all the places it breaks, moving Guava >>> dependencies local to places where an IO or extension must use it for >>> interop? Then you don't need adapters. >>> > >>> > In both of the above approaches, diamond dependency problems between >>> IOs are quite possible. >>> > >>> > I don't know if we can do better. For example, producing a >>> bigtable-client-core where we have relocated Guava internally and using >>> that could really be an interop nightmare as things that look like the same >>> type would not be. Less likely to be broken would be bigtable-client-core >>> entirely relocated and vendored, but generally IO connectors exchange >>> objects with users and the users would have to use the relocated versions, >>> so that's gross. >>> > >>> > Kenn >>> > >>> > [1] >>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L353 >>> > >>> > >>> > On Thu, Feb 28, 2019 at 2:29 AM Gleb Kanterov <g...@spotify.com> >>> wrote: >>> >> >>> >> For the past week, two independent people have asked me if I can help >>> with guava NoSuchMethodError in BigtableIO. It turns out we still have a >>> potential problem with dependencies that don't vendor guava, in this case, >>> it was bigtable-client-core that depends on guava-26.0. However, the root >>> cause of the classpath problem was in the usage of a deprecated method from >>> non-vendored guava in BigtableServiceClientImpl in the code path where we >>> integrate with bigtable client. >>> >> >>> >> I created apache/beam#7957 [1] to fix that. There few other IO-s >>> where we use non-vendored guava that we can fix using adapters. >>> >> >>> >> And there is an unknown number of conflicts between guava versions in >>> our dependencies that don't vendor it, that as I understand it, could be >>> fixed by relocating them, in a similar way we do for Calcite [2]. >>> >> >>> >> [1]: https://github.com/apache/beam/pull/7957 >>> >> [2]: >>> https://github.com/apache/beam/blob/61de62ecbe8658de866280a8976030a0cb877041/sdks/java/extensions/sql/build.gradle#L30-L39 >>> >> >>> >> Gleb >>> >> >>> >> On Sun, Jan 20, 2019 at 11:43 AM Gleb Kanterov <g...@spotify.com> >>> wrote: >>> >>> >>> >>> I didn't look deep into it, but it seems we can put >>> .idea/codeInsightSettings.xml into our repository where we blacklist >>> packages from auto-import. See an example in >>> JetBrains/kotlin/.idea/codeInsightSettings.xml. >>> >>> >>> >>> On Sat, Jan 19, 2019 at 8:03 PM Reuven Lax <re...@google.com> wrote: >>> >>>> >>> >>>> Bad IDEs automatically generate the wrong import. I think we need >>> to automatically prevent this, otherwise the bad imports will inevitably >>> slip back in. >>> >>>> >>> >>>> Reuven >>> >>>> >>> >>>> On Tue, Jan 15, 2019 at 2:54 AM Łukasz Gajowy < >>> lukasz.gaj...@gmail.com> wrote: >>> >>>>> >>> >>>>> Great news. Thanks all for this work! >>> >>>>> >>> >>>>> +1 to enforcing this on dependency level as Kenn suggested. >>> >>>>> >>> >>>>> Łukasz >>> >>>>> >>> >>>>> wt., 15 sty 2019 o 01:18 Kenneth Knowles <k...@apache.org> >>> napisał(a): >>> >>>>>> >>> >>>>>> We can enforce at the dependency level, since it is a compile >>> error. I think some IDEs and build tools may allow the compile-time >>> classpath to get polluted by transitive runtime deps, so protecting against >>> bad imports is also a good idea. >>> >>>>>> >>> >>>>>> Kenn >>> >>>>>> >>> >>>>>> On Mon, Jan 14, 2019 at 8:42 AM Ismaël Mejía <ieme...@gmail.com> >>> wrote: >>> >>>>>>> >>> >>>>>>> Not yet, we need to add that too, there are still some tasks to >>> be >>> >>>>>>> done like improve the contribution guide with this info, and >>> document >>> >>>>>>> how to generate a src build artifact locally since I doubt we >>> can >>> >>>>>>> publish that into Apache for copyright reasons. >>> >>>>>>> I will message in the future for awareness for awareness when >>> most of >>> >>>>>>> the pending tasks are finished. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Mon, Jan 14, 2019 at 3:51 PM Maximilian Michels < >>> m...@apache.org> wrote: >>> >>>>>>> > >>> >>>>>>> > Thanks for the heads up, Ismaël! Great to see the vendored >>> Guava version is used >>> >>>>>>> > everywhere now. >>> >>>>>>> > >>> >>>>>>> > Do we already have a Checkstyle rule that prevents people from >>> using the >>> >>>>>>> > unvendored Guava? If not, such a rule could be useful because >>> it's almost >>> >>>>>>> > inevitable that the unvedored Guava will slip back in. >>> >>>>>>> > >>> >>>>>>> > Cheers, >>> >>>>>>> > Max >>> >>>>>>> > >>> >>>>>>> > On 14.01.19 05:55, Ismaël Mejía wrote: >>> >>>>>>> > > We merged today the PR [1] that changes most of the code to >>> use our >>> >>>>>>> > > new guava vendored dependency. In practice it means that >>> most of the >>> >>>>>>> > > imports of the classes were changed from >>> `com.google.common.` to >>> >>>>>>> > > `org.apache.beam.vendor.guava.v20_0.com.google.common.` >>> >>>>>>> > > >>> >>>>>>> > > This is a great improvement to fix a long existing problem >>> of guava >>> >>>>>>> > > leaking through some Beam modules. This also reduces the >>> size of most >>> >>>>>>> > > jars in the project because they don't need to relocate and >>> include >>> >>>>>>> > > guava anymore, they just use the vendored dependency. >>> >>>>>>> > > >>> >>>>>>> > > Kudos to Kenn Knowles, Lukasz Cwik, Scott Wegner and the >>> others that >>> >>>>>>> > > worked (are working) to make this possible. >>> >>>>>>> > > >>> >>>>>>> > > Sadly as a side effect of the merge of this PR multiple PRs >>> were >>> >>>>>>> > > broken so please review if yours was and do a rebase and fix >>> the >>> >>>>>>> > > imports to use the new vendored dependency. Sorry for the >>> >>>>>>> > > inconvenience. From now one all uses of guava should use the >>> vendored >>> >>>>>>> > > version. Expect some updates in the docs. >>> >>>>>>> > > >>> >>>>>>> > > [1] https://github.com/apache/beam/pull/6809 >>> >>>>>>> > > >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Cheers, >>> >>> Gleb >>> >> >>> >> >>> >> >>> >> -- >>> >> Cheers, >>> >> Gleb >>> >> >> >> -- >> Cheers, >> Gleb >> >