That is an even better idea. A lot of guava constructs have been superseded
by stuff that was added to Java 8.

On Wed, Jan 31, 2018 at 9:56 PM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> Why not dropping guava for all beam codebase? With java 8 it is quite easy
> to do it and avoid a bunch of conflicts. Did it in 2 projects with quite a
> good result.
>
> Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com> a écrit :
>
>> Make sure to include the guava version in the artifact name so that we
>> can have multiple vendored versions.
>>
>> On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com> wrote:
>>
>>> I didn't have time for this, but it just bit me. We definitely have
>>> Guava on the API surface of runner support code in ways that get
>>> incompatibly shaded. I will probably start "1a" by making a shaded library
>>> org.apache.beam:vendored-guava and starting to use it. It sounds like there
>>> is generally unanimous support for that much, anyhow.
>>>
>>> Kenn
>>>
>>> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <aljos...@apache.org>
>>> wrote:
>>>
>>>> Thanks Ismaël for bringing up this discussion again!
>>>>
>>>> I would be in favour of 1) and more specifically of 1a)
>>>>
>>>> Aljoscha
>>>>
>>>>
>>>> On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>> You can always run tests on post shaded artifacts instead of the
>>>> compiled classes, it just requires us to change our maven surefire / gradle
>>>> test configurations but it is true that most IDEs would behave better with
>>>> a dependency jar unless you delegate all the build/test actions to the
>>>> build system and then it won't matter.
>>>>
>>>> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <k...@google.com>
>>>> wrote:
>>>>
>>>>> There's also, with additional overhead,
>>>>>
>>>>> 1a) A relocated and shipped package for each thing we want to
>>>>> relocate. I think this has also been tried outside Beam...
>>>>>
>>>>> Pros:
>>>>> * all the pros of 1) plus no bloat beyond what is necessary
>>>>> Cons:
>>>>> * abandons whitelist approach for public deps, reverting to blacklist
>>>>> approach for trouble things like guava, so a bit less principled
>>>>>
>>>>> For both 1) and 1a) I would add:
>>>>>
>>>>> Pros:
>>>>> * clearly readable dependency since code will `import
>>>>> org.apache.beam.private.guava21` and IDEs will understand it is a
>>>>> distinct lilbrary
>>>>> * can run tests on unpackaged classes, as long as deps are shaded or
>>>>> provided as jars
>>>>> * no mysterious action at a distance from inherited configuration
>>>>> Cons:
>>>>> * need to adjust imports
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <lc...@google.com> wrote:
>>>>>
>>>>>> I would suggest that either we use:
>>>>>> 1) A common deps package containing shaded dependencies allows for
>>>>>> Pros
>>>>>> * doesn't require the user to build an uber jar
>>>>>> Risks
>>>>>> * dependencies package will keep growing even if something is or
>>>>>> isn't needed by all of Apache Beam leading to a large jar anyways 
>>>>>> negating
>>>>>> any space savings
>>>>>>
>>>>>> 2) Shade within each module to a common location like
>>>>>> org.apache.beam.relocated.guava....
>>>>>> Pros
>>>>>> * you only get the shaded dependencies of the things that are required
>>>>>> * its one less dependency for users to manage
>>>>>> Risks
>>>>>> * requires an uber jar to be built to get the space savings (either
>>>>>> by a user or a distribution of Apache Beam) otherwise we negate any space
>>>>>> savings.
>>>>>>
>>>>>> If we either use a common relocation scheme or a dependencies jar
>>>>>> then each relocation should specifically contain the version number of 
>>>>>> the
>>>>>> package because we would like to allow for us to be using two different
>>>>>> versions of the same library.
>>>>>>
>>>>>> For the common deps package approach, should we check in code where
>>>>>> the imports contain the relocated location (e.g. import
>>>>>> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)?
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré <
>>>>>> j...@nanthrax.net> wrote:
>>>>>>
>>>>>>> Thanks for bringing that back.
>>>>>>>
>>>>>>> Indeed guava is shaded in different uber-jar. Maybe we can have a
>>>>>>> common deps module that we include once (but the user will have to
>>>>>>> explicitly define the dep) ?
>>>>>>>
>>>>>>> Basically, what do you propose for protobuf (unfortunately, I don't
>>>>>>> see an obvious) ?
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>>
>>>>>>> On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>>>>>>>
>>>>>>>> Hello, I wanted to bring back this subject because I think we should
>>>>>>>> take action on this and at least first have a shaded version of
>>>>>>>> guava.
>>>>>>>> I was playing with a toy project and I did the procedure we use to
>>>>>>>> submit jars to a Hadoop cluster via Flink/Spark which involves
>>>>>>>> creating an uber jar and I realized that the size of the jar was way
>>>>>>>> bigger than I expected, and the fact that we shade guava in every
>>>>>>>> module contributes to this. I found guava shaded on:
>>>>>>>>
>>>>>>>> sdks/java/core
>>>>>>>> runners/core-construction-java
>>>>>>>> runners/core-java
>>>>>>>> model/job-management
>>>>>>>> runners/spark
>>>>>>>> sdks/java/io/hadoop-file-system
>>>>>>>> sdks/java/io/kafka
>>>>>>>>
>>>>>>>> This means at least 6 times more of the size it should which counts
>>>>>>>> in
>>>>>>>> around 15MB more (2.4MB*6 deps) of extra weight that we can simply
>>>>>>>> reduce by using a shaded version of the library.
>>>>>>>>
>>>>>>>> I add this point to the previous ones mentioned during the
>>>>>>>> discussion
>>>>>>>> because this goes against the end user experience and affects us all
>>>>>>>> (devs/users).
>>>>>>>>
>>>>>>>> Another question is if we should shade (and how) protocol buffers
>>>>>>>> because now with the portability work we are exposing it closer to
>>>>>>>> the
>>>>>>>> end users. I say this because I also found an issue while running a
>>>>>>>> job on YARN with the spark runner because hadoop-common includes
>>>>>>>> protobuf-java 2 and I had to explicitly provide protocol-buffers 3
>>>>>>>> as
>>>>>>>> a dependency to be able to use triggers (note the Spark runner
>>>>>>>> translates them using some method from runners/core-java). Since
>>>>>>>> hadoop-common is provided in the cluster with the older version of
>>>>>>>> protobuf, I am afraid that this will bite us in the future.
>>>>>>>>
>>>>>>>> Ismaël
>>>>>>>>
>>>>>>>> ps. There is already a JIRA for that shading for protobuf on
>>>>>>>> hadoop-common but this is not coming until version 3 is out.
>>>>>>>> https://issues.apache.org/jira/browse/HADOOP-13136
>>>>>>>>
>>>>>>>> ps2. Extra curious situation is to see that the dataflow-runner ends
>>>>>>>> up having guava shaded twice via its shaded version on
>>>>>>>> core-construction-java.
>>>>>>>>
>>>>>>>> ps3. Of course this message means a de-facto +1 at least to do it
>>>>>>>> for
>>>>>>>> guava and evaluate it for other libs.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik <
>>>>>>>> lc...@google.com.invalid> wrote:
>>>>>>>>
>>>>>>>>> An issue to call out is how to deal with our generated code (.avro
>>>>>>>>> and
>>>>>>>>> .proto) as I don't believe those plugins allow you to generate
>>>>>>>>> code using a
>>>>>>>>> shaded package prefix on imports.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh <
>>>>>>>>> tg...@google.com.invalid>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> +1 to the goal. I'm hugely in favor of not doing the same shading
>>>>>>>>>> work
>>>>>>>>>> every time for dependencies we know we'll use.
>>>>>>>>>>
>>>>>>>>>> This also means that if we end up pulling in transitive
>>>>>>>>>> dependencies we
>>>>>>>>>> don't want in any particular module we can avoid having to adjust
>>>>>>>>>> our
>>>>>>>>>> repackaging strategy for that module - which I have run into
>>>>>>>>>> face-first in
>>>>>>>>>> the past.
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles <
>>>>>>>>>> k...@google.com.invalid>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Shading is a big part of how we keep our dependencies sane in
>>>>>>>>>>> Beam. But
>>>>>>>>>>> downsides: shading is super slow, causes massive jar bloat, and
>>>>>>>>>>> kind of
>>>>>>>>>>> hard to get right because artifacts and namespaces are not
>>>>>>>>>>> 1-to-1.
>>>>>>>>>>>
>>>>>>>>>>> I know that some communities distribute their own shaded
>>>>>>>>>>> distributions of
>>>>>>>>>>> dependencies. I had a thought about doing something similar that
>>>>>>>>>>> I wanted
>>>>>>>>>>> to throw out there for people to poke holes in.
>>>>>>>>>>>
>>>>>>>>>>> To set the scene, here is how I view shading:
>>>>>>>>>>>
>>>>>>>>>>>   - A module has public dependencies and private dependencies.
>>>>>>>>>>>   - Public deps are used for data interchange; users must share
>>>>>>>>>>> these
>>>>>>>>>>>
>>>>>>>>>> deps.
>>>>>>>>>>
>>>>>>>>>>>   - Private deps are just functionality and can be hidden (in
>>>>>>>>>>> our case,
>>>>>>>>>>> relocated + bundled)
>>>>>>>>>>>   - It isn't necessarily that simple, because public and private
>>>>>>>>>>> deps
>>>>>>>>>>>
>>>>>>>>>> might
>>>>>>>>>>
>>>>>>>>>>> interact in higher-order ways ("public" is contagious)
>>>>>>>>>>>
>>>>>>>>>>> Shading is an implementation detail of expressing these
>>>>>>>>>>> characteristics.
>>>>>>>>>>>
>>>>>>>>>> We
>>>>>>>>>>
>>>>>>>>>>> use shading selectively because of its downsides I mentioned
>>>>>>>>>>> above.
>>>>>>>>>>>
>>>>>>>>>>> But what about this idea: Introduce shaded deps as a single
>>>>>>>>>>> separate
>>>>>>>>>>> artifact.
>>>>>>>>>>>
>>>>>>>>>>>   - sdks/java/private-deps: bundled uber jar with relocated
>>>>>>>>>>> versions of
>>>>>>>>>>> everything we want to shade
>>>>>>>>>>>
>>>>>>>>>>>   - sdks/java/core and sdks/java/harness: no relocation or
>>>>>>>>>>> bundling -
>>>>>>>>>>> depends on `beam-sdks-java-private-deps` and imports like
>>>>>>>>>>> `org.apache.beam.sdk.private.com.google.common` directly (this
>>>>>>>>>>> is what
>>>>>>>>>>> they
>>>>>>>>>>> are rewritten to
>>>>>>>>>>>
>>>>>>>>>>> Some benefits
>>>>>>>>>>>
>>>>>>>>>>>   - much faster builds of other modules
>>>>>>>>>>>   - only one shaded uber jar
>>>>>>>>>>>   - rare/no rebuilds of the uber jar
>>>>>>>>>>>   - can use maven enforcer to forbid imports like
>>>>>>>>>>> com.google.common
>>>>>>>>>>>   - configuration all in one place
>>>>>>>>>>>   - no automated rewriting of our real code, which has led to
>>>>>>>>>>> some major
>>>>>>>>>>> confusion
>>>>>>>>>>>   - easy to implement incrementally
>>>>>>>>>>>
>>>>>>>>>>> Downsides:
>>>>>>>>>>>
>>>>>>>>>>>   - plenty of effort work to get there
>>>>>>>>>>>   - unclear how many different such deps modules we need;
>>>>>>>>>>> sharing them
>>>>>>>>>>>
>>>>>>>>>> could
>>>>>>>>>>
>>>>>>>>>>> get weird
>>>>>>>>>>>   - if we hit a roadblock, we will have committed a lot of time
>>>>>>>>>>>
>>>>>>>>>>> Just something I was musing as I spent another evening waiting
>>>>>>>>>>> for slow
>>>>>>>>>>> builds to try to confirm changes to brittle poms.
>>>>>>>>>>>
>>>>>>>>>>> Kenn
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>> --
>>>>>>> Jean-Baptiste Onofré
>>>>>>> jbono...@apache.org
>>>>>>> http://blog.nanthrax.net
>>>>>>> Talend - http://www.talend.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>

Reply via email to