On Wed, Jan 19, 2022 at 10:59 AM Brian Hulette <bhule...@google.com> wrote:
> > But I believe the issue is that some tooling will bundle up the > "compile" dependencies and submit with the job, which will then have a > conflict with the libraries on the cluster. > > Do you know specifically what tooling does this? Also I'm not clear what > you mean by "cluster", is this referring to remote workers? > No I don't, actually. I was being deliberately vague. I think "cluster" could be Spark (or Flink) workers, etc. There are some flags to tweak whose dependencies win. Dataflow workers also could have this issue. I guess something building an uber jar would be the typical example. To be super clear: I'm pretty much in agreement with Daniel, but I also think there must be some good reason we should account for that it isn't done that way already. To unblock the release, Emily has opened a PR to match the prior pom exactly, which I also think is smart. We can fix up / eliminate `provided` deps one or a few at a time moving on. Kenn > > On Tue, Jan 18, 2022 at 6:11 AM Kenneth Knowles <k...@apache.org> wrote: > >> >> >> On Fri, Jan 14, 2022 at 9:34 AM Daniel Collins <dpcoll...@google.com> >> wrote: >> >>> > In particular the Hadoop/Spark and Kafka dependencies must be >>> **provided** as they were. I am not sure of others but those three matter. >>> >>> I think there's a bit of a difference here between what should be the >>> state in the short term versus the long term. >>> >>> In the short term, I agree that we should avoid changes to how these >>> dependencies are reflected in the POM. >>> >>> In the long term, I don't think it makes sense for these to continue to >>> be "provided" dependencies- if users wish to use a different version of >>> hadoop, spark or kafka, they can explicitly override the dependencies with >>> the version they want when building their JAR, even if there is a version >>> listed as "compile" in the POM file on maven central. The only difference >>> is that if they don't have a version preference, the one listed in the POM >>> (that we tested with) will be used, which seems like an unambiguous win to >>> me. >>> >> >> Agree with the sentiment. But I believe the issue is that some tooling >> will bundle up the "compile" dependencies and submit with the job, which >> will then have a conflict with the libraries on the cluster. On the other >> hand, the user will always want to override the "provided" version to match >> the cluster, in which case it will just be harmless duplicates on the >> classpath, no? I guess huge file size, but it isn't the 90s any more. Since >> Ismaël commented, maybe he can help to clarify. I also knew about this >> reasoning for Spark & Hadoop but I don't know exactly what is required to >> make it work right. >> >> This could become a bothersome issue long term - Gradle dev community has >> lots of posts that indicate they don't agree with the existence of >> "provided" or "optional" dependencies. (I happen to agree with them, but >> philosophy is not the point). We should have a very clear solution for the >> cases that require one, and document at least on the wiki. >> >> Kenn >> >> >>> >>> -Daniel >>> >>> On Thu, Jan 13, 2022 at 4:19 PM Ismaël Mejía <ieme...@gmail.com> wrote: >>> >>>> Optional dependencies should not be a major issue. >>>> >>>> What matters to validate that we are not breaking users is to compare >>>> the generated POM files with the previous (pre gradle 7 / 2.35.0) >>>> version and see that what was provided is still provided. >>>> >>>> In particular the Hadoop/Spark and Kafka dependencies must be >>>> **provided** as they were. I am not sure of others but those three >>>> matter. >>>> >>>> Ismaël >>>> >>>> On Wed, Jan 12, 2022 at 10:55 PM Emily Ye <emil...@google.com> wrote: >>>> > >>>> > We've chatted offline and have a tentative plan for what to do with >>>> these dependencies that are currently marked as compileOnly (instead of >>>> provided). Please review the list if possible [1]. >>>> > >>>> > Two projects we aren't sure about: >>>> > >>>> > :sdks:java:io:hcatalog >>>> > >>>> > library.java.jackson_annotations >>>> > library.java.jackson_core >>>> > library.java.jackson_databind >>>> > library.java.hadoop_common >>>> > org.apache.hive:hive-exec >>>> > org.apache.hive.hcatalog:hive-hcatalog-core >>>> > >>>> > :sdks:java:io:parquet >>>> > >>>> > library.java.hadoop_client >>>> > >>>> > >>>> > Does anyone have experience with either of these IOs? ccing Chamikara >>>> > >>>> > Thank you, >>>> > Emily >>>> > >>>> > >>>> > [1] >>>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing >>>> > >>>> > On Tue, Jan 11, 2022 at 6:38 PM Emily Ye <emil...@google.com> wrote: >>>> >> >>>> >> As the person volunteering to do fixes for this to unblock Beam >>>> 2.36.0, I created a spreadsheet of the projects with dependencies changed >>>> from provided to compile only [1]. I pre-filled with what I think things >>>> should be, but I don't have very much background in java/maven/gradle >>>> configurations so please give input! >>>> >> >>>> >> Some (mainly hadoop/kafka) I left blank, since I'm not sure - do we >>>> keep them provided because it depends on the user's version? >>>> >> >>>> >> [1] >>>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing >>>> >> >>>> >> On Tue, Jan 11, 2022 at 1:17 PM Luke Cwik <lc...@google.com> wrote: >>>> >>> >>>> >>> I'm not following what you're trying to say Kenn since provided in >>>> maven requires the user to explicitly add the dependency themselves to have >>>> it part of their runtime. >>>> >>> >>>> >>> As per >>>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#dependency-scope >>>> >>> " >>>> >>> * provided >>>> >>> This is much like compile, but indicates you expect the JDK or a >>>> container to provide the dependency at runtime. For example, when building >>>> a web application for the Java Enterprise Edition, you would set the >>>> dependency on the Servlet API and related Java EE APIs to scope provided >>>> because the web container provides those classes. A dependency with this >>>> scope is added to the classpath used for compilation and test, but not the >>>> runtime classpath. It is not transitive." >>>> >>> >>>> >>> On Tue, Jan 11, 2022 at 11:54 AM Kenneth Knowles <k...@apache.org> >>>> wrote: >>>> >>>> >>>> >>>> To clarify: "provided" should have been in the test runtime >>>> configuration, but not in the shipped runtime configuration (otherwise dep >>>> resolution for users would pull in provided deps, which should not happen) >>>> >>>> >>>> >>>> On Thu, Dec 30, 2021 at 10:05 AM Luke Cwik <lc...@google.com> >>>> wrote: >>>> >>>>> >>>> >>>>> During the migration to Gradle 7[1] the propdeps plugin was >>>> removed[2] since there wasn't a newer version that was compatible with >>>> Gradle 7 and a replacement couldn't be found. All existing usages of >>>> "provided" were moved to "compileOnly" and "compileOnly" is being mapped to >>>> the "provided" maven scope in the generated pom files. This has lead to two >>>> issues: >>>> >>>>> 1) provided was also part of the runtime configuration, so we are >>>> getting a few class not found exceptions when running tests [3] >>>> >>>>> 2) the generated pom.xml will have a bunch of compile time only >>>> annotations added as a provided dependency in the generated pom files[4] >>>> >>>>> >>>> >>>>> #1 can be fixed by adding the dependency to both the >>>> "compileOnly" and "runtimeOnly" configurations or by adding dependency to >>>> the "implementation" configuration >>>> >>>>> #2 will make the pom files messier which can lead to confusion >>>> for users but shouldn't impact existing uses. >>>> >>>>> >>>> >>>>> There was a suggestion[4] to completely remove the usage of >>>> provided from the generated pom.xml and have all our previously "provided" >>>> dependencies declared as "implementation" allowing us to solve both #1 and >>>> #2 above. >>>> >>>>> >>>> >>>>> The largest usage of "provided" in the past was to packages >>>> related to the hadoop ecosystem and afterwards it was for packages such as >>>> junit/hamcrest/aircompressor in sdks/java/core which aren't required to use >>>> the module but can provide additional features if the dependency exists. >>>> >>>>> >>>> >>>>> What should we migrate if anything to the "implementation" >>>> configuration or should we try to recreate what we were doing with the >>>> "provided" configuration in the past? >>>> >>>>> >>>> >>>>> 1: https://issues.apache.org/jira/browse/BEAM-13430 >>>> >>>>> 2: https://github.com/apache/beam/pull/16308 >>>> >>>>> 3: https://issues.apache.org/jira/browse/BEAM-13569 >>>> >>>>> 4: >>>> https://github.com/apache/beam/blob/fe456b79419d1a67ebf13d7d4b6695fa1aa6204d/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L964 >>>> >>>>> 5: https://issues.apache.org/jira/browse/BEAM-13504 >>>> >>>>> >>>> >>>