On Wed, Jan 19, 2022 at 10:59 AM Brian Hulette <bhule...@google.com> wrote:

> > But I believe the issue is that some tooling will bundle up the
> "compile" dependencies and submit with the job, which will then have a
> conflict with the libraries on the cluster.
>
> Do you know specifically what tooling does this? Also I'm not clear what
> you mean by "cluster", is this referring to remote workers?
>

No I don't, actually. I was being deliberately vague. I think "cluster"
could be Spark (or Flink) workers, etc. There are some flags to tweak whose
dependencies win. Dataflow workers also could have this issue. I guess
something building an uber jar would be the typical example.

To be super clear: I'm pretty much in agreement with Daniel, but I also
think there must be some good reason we should account for that it isn't
done that way already.

To unblock the release, Emily has opened a PR to match the prior pom
exactly, which I also think is smart. We can fix up / eliminate `provided`
deps one or a few at a time moving on.

Kenn


>
> On Tue, Jan 18, 2022 at 6:11 AM Kenneth Knowles <k...@apache.org> wrote:
>
>>
>>
>> On Fri, Jan 14, 2022 at 9:34 AM Daniel Collins <dpcoll...@google.com>
>> wrote:
>>
>>> > In particular the Hadoop/Spark and Kafka dependencies must be
>>> **provided** as they were. I am not sure of others but those three matter.
>>>
>>> I think there's a bit of a difference here between what should be the
>>> state in the short term versus the long term.
>>>
>>> In the short term, I agree that we should avoid changes to how these
>>> dependencies are reflected in the POM.
>>>
>>> In the long term, I don't think it makes sense for these to continue to
>>> be "provided" dependencies- if users wish to use a different version of
>>> hadoop, spark or kafka, they can explicitly override the dependencies with
>>> the version they want when building their JAR, even if there is a version
>>> listed as "compile" in the POM file on maven central. The only difference
>>> is that if they don't have a version preference, the one listed in the POM
>>> (that we tested with) will be used, which seems like an unambiguous win to
>>> me.
>>>
>>
>> Agree with the sentiment. But I believe the issue is that some tooling
>> will bundle up the "compile" dependencies and submit with the job, which
>> will then have a conflict with the libraries on the cluster. On the other
>> hand, the user will always want to override the "provided" version to match
>> the cluster, in which case it will just be harmless duplicates on the
>> classpath, no? I guess huge file size, but it isn't the 90s any more. Since
>> Ismaël commented, maybe he can help to clarify. I also knew about this
>> reasoning for Spark & Hadoop but I don't know exactly what is required to
>> make it work right.
>>
>> This could become a bothersome issue long term - Gradle dev community has
>> lots of posts that indicate they don't agree with the existence of
>> "provided" or "optional" dependencies. (I happen to agree with them, but
>> philosophy is not the point). We should have a very clear solution for the
>> cases that require one, and document at least on the wiki.
>>
>> Kenn
>>
>>
>>>
>>> -Daniel
>>>
>>> On Thu, Jan 13, 2022 at 4:19 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>>>
>>>> Optional dependencies should not be a major issue.
>>>>
>>>> What matters to validate that we are not breaking users is to compare
>>>> the generated POM files with the previous (pre gradle 7 / 2.35.0)
>>>> version and see that what was provided is still provided.
>>>>
>>>> In particular the Hadoop/Spark and Kafka dependencies must be
>>>> **provided** as they were. I am not sure of others but those three
>>>> matter.
>>>>
>>>> Ismaël
>>>>
>>>> On Wed, Jan 12, 2022 at 10:55 PM Emily Ye <emil...@google.com> wrote:
>>>> >
>>>> > We've chatted offline and have a tentative plan for what to do with
>>>> these dependencies that are currently marked as compileOnly (instead of
>>>> provided). Please review the list if possible [1].
>>>> >
>>>> > Two projects we aren't sure about:
>>>> >
>>>> > :sdks:java:io:hcatalog
>>>> >
>>>> > library.java.jackson_annotations
>>>> > library.java.jackson_core
>>>> > library.java.jackson_databind
>>>> > library.java.hadoop_common
>>>> > org.apache.hive:hive-exec
>>>> > org.apache.hive.hcatalog:hive-hcatalog-core
>>>> >
>>>> > :sdks:java:io:parquet
>>>> >
>>>> > library.java.hadoop_client
>>>> >
>>>> >
>>>> > Does anyone have experience with either of these IOs? ccing Chamikara
>>>> >
>>>> > Thank you,
>>>> > Emily
>>>> >
>>>> >
>>>> > [1]
>>>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing
>>>> >
>>>> > On Tue, Jan 11, 2022 at 6:38 PM Emily Ye <emil...@google.com> wrote:
>>>> >>
>>>> >> As the person volunteering to do fixes for this to unblock Beam
>>>> 2.36.0, I created a spreadsheet of the projects with dependencies changed
>>>> from provided to compile only [1]. I pre-filled with what I think things
>>>> should be, but I don't have very much background in java/maven/gradle
>>>> configurations so please give input!
>>>> >>
>>>> >> Some (mainly hadoop/kafka) I left blank, since I'm not sure - do we
>>>> keep them provided because it depends on the user's version?
>>>> >>
>>>> >> [1]
>>>> https://docs.google.com/spreadsheets/d/1UpeQtx1PoAgeSmpKxZC9lv3B9G1c7cryW3iICfRtG1o/edit?usp=sharing
>>>> >>
>>>> >> On Tue, Jan 11, 2022 at 1:17 PM Luke Cwik <lc...@google.com> wrote:
>>>> >>>
>>>> >>> I'm not following what you're trying to say Kenn since provided in
>>>> maven requires the user to explicitly add the dependency themselves to have
>>>> it part of their runtime.
>>>> >>>
>>>> >>> As per
>>>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#dependency-scope
>>>> >>> "
>>>> >>> * provided
>>>> >>> This is much like compile, but indicates you expect the JDK or a
>>>> container to provide the dependency at runtime. For example, when building
>>>> a web application for the Java Enterprise Edition, you would set the
>>>> dependency on the Servlet API and related Java EE APIs to scope provided
>>>> because the web container provides those classes. A dependency with this
>>>> scope is added to the classpath used for compilation and test, but not the
>>>> runtime classpath. It is not transitive."
>>>> >>>
>>>> >>> On Tue, Jan 11, 2022 at 11:54 AM Kenneth Knowles <k...@apache.org>
>>>> wrote:
>>>> >>>>
>>>> >>>> To clarify: "provided" should have been in the test runtime
>>>> configuration, but not in the shipped runtime configuration (otherwise dep
>>>> resolution for users would pull in provided deps, which should not happen)
>>>> >>>>
>>>> >>>> On Thu, Dec 30, 2021 at 10:05 AM Luke Cwik <lc...@google.com>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> During the migration to Gradle 7[1] the propdeps plugin was
>>>> removed[2] since there wasn't a newer version that was compatible with
>>>> Gradle 7 and a replacement couldn't be found. All existing usages of
>>>> "provided" were moved to "compileOnly" and "compileOnly" is being mapped to
>>>> the "provided" maven scope in the generated pom files. This has lead to two
>>>> issues:
>>>> >>>>> 1) provided was also part of the runtime configuration, so we are
>>>> getting a few class not found exceptions when running tests [3]
>>>> >>>>> 2) the generated pom.xml will have a bunch of compile time only
>>>> annotations added as a provided dependency in the generated pom files[4]
>>>> >>>>>
>>>> >>>>> #1 can be fixed by adding the dependency to both the
>>>> "compileOnly" and "runtimeOnly" configurations or by adding dependency to
>>>> the "implementation" configuration
>>>> >>>>> #2 will make the pom files messier which can lead to confusion
>>>> for users but shouldn't impact existing uses.
>>>> >>>>>
>>>> >>>>> There was a suggestion[4] to completely remove the usage of
>>>> provided from the generated pom.xml and have all our previously "provided"
>>>> dependencies declared as "implementation" allowing us to solve both #1 and
>>>> #2 above.
>>>> >>>>>
>>>> >>>>> The largest usage of "provided" in the past was to packages
>>>> related to the hadoop ecosystem and afterwards it was for packages such as
>>>> junit/hamcrest/aircompressor in sdks/java/core which aren't required to use
>>>> the module but can provide additional features if the dependency exists.
>>>> >>>>>
>>>> >>>>> What should we migrate if anything to the "implementation"
>>>> configuration or should we try to recreate what we were doing with the
>>>> "provided" configuration in the past?
>>>> >>>>>
>>>> >>>>> 1: https://issues.apache.org/jira/browse/BEAM-13430
>>>> >>>>> 2: https://github.com/apache/beam/pull/16308
>>>> >>>>> 3: https://issues.apache.org/jira/browse/BEAM-13569
>>>> >>>>> 4:
>>>> https://github.com/apache/beam/blob/fe456b79419d1a67ebf13d7d4b6695fa1aa6204d/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L964
>>>> >>>>> 5: https://issues.apache.org/jira/browse/BEAM-13504
>>>> >>>>>
>>>>
>>>

Reply via email to