Make sure to include the guava version in the artifact name so that we can
have multiple vendored versions.

On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com> wrote:

> I didn't have time for this, but it just bit me. We definitely have Guava
> on the API surface of runner support code in ways that get incompatibly
> shaded. I will probably start "1a" by making a shaded library
> org.apache.beam:vendored-guava and starting to use it. It sounds like there
> is generally unanimous support for that much, anyhow.
>
> Kenn
>
> On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Thanks Ismaël for bringing up this discussion again!
>>
>> I would be in favour of 1) and more specifically of 1a)
>>
>> Aljoscha
>>
>>
>> On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com> wrote:
>>
>> You can always run tests on post shaded artifacts instead of the compiled
>> classes, it just requires us to change our maven surefire / gradle test
>> configurations but it is true that most IDEs would behave better with a
>> dependency jar unless you delegate all the build/test actions to the build
>> system and then it won't matter.
>>
>> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <k...@google.com> wrote:
>>
>>> There's also, with additional overhead,
>>>
>>> 1a) A relocated and shipped package for each thing we want to relocate.
>>> I think this has also been tried outside Beam...
>>>
>>> Pros:
>>> * all the pros of 1) plus no bloat beyond what is necessary
>>> Cons:
>>> * abandons whitelist approach for public deps, reverting to blacklist
>>> approach for trouble things like guava, so a bit less principled
>>>
>>> For both 1) and 1a) I would add:
>>>
>>> Pros:
>>> * clearly readable dependency since code will `import
>>> org.apache.beam.private.guava21` and IDEs will understand it is a
>>> distinct lilbrary
>>> * can run tests on unpackaged classes, as long as deps are shaded or
>>> provided as jars
>>> * no mysterious action at a distance from inherited configuration
>>> Cons:
>>> * need to adjust imports
>>>
>>> Kenn
>>>
>>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> I would suggest that either we use:
>>>> 1) A common deps package containing shaded dependencies allows for
>>>> Pros
>>>> * doesn't require the user to build an uber jar
>>>> Risks
>>>> * dependencies package will keep growing even if something is or isn't
>>>> needed by all of Apache Beam leading to a large jar anyways negating any
>>>> space savings
>>>>
>>>> 2) Shade within each module to a common location like
>>>> org.apache.beam.relocated.guava....
>>>> Pros
>>>> * you only get the shaded dependencies of the things that are required
>>>> * its one less dependency for users to manage
>>>> Risks
>>>> * requires an uber jar to be built to get the space savings (either by
>>>> a user or a distribution of Apache Beam) otherwise we negate any space
>>>> savings.
>>>>
>>>> If we either use a common relocation scheme or a dependencies jar then
>>>> each relocation should specifically contain the version number of the
>>>> package because we would like to allow for us to be using two different
>>>> versions of the same library.
>>>>
>>>> For the common deps package approach, should we check in code where the
>>>> imports contain the relocated location (e.g. import
>>>> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)?
>>>>
>>>>
>>>> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Thanks for bringing that back.
>>>>>
>>>>> Indeed guava is shaded in different uber-jar. Maybe we can have a
>>>>> common deps module that we include once (but the user will have to
>>>>> explicitly define the dep) ?
>>>>>
>>>>> Basically, what do you propose for protobuf (unfortunately, I don't
>>>>> see an obvious) ?
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>>>>>
>>>>>> Hello, I wanted to bring back this subject because I think we should
>>>>>> take action on this and at least first have a shaded version of guava.
>>>>>> I was playing with a toy project and I did the procedure we use to
>>>>>> submit jars to a Hadoop cluster via Flink/Spark which involves
>>>>>> creating an uber jar and I realized that the size of the jar was way
>>>>>> bigger than I expected, and the fact that we shade guava in every
>>>>>> module contributes to this. I found guava shaded on:
>>>>>>
>>>>>> sdks/java/core
>>>>>> runners/core-construction-java
>>>>>> runners/core-java
>>>>>> model/job-management
>>>>>> runners/spark
>>>>>> sdks/java/io/hadoop-file-system
>>>>>> sdks/java/io/kafka
>>>>>>
>>>>>> This means at least 6 times more of the size it should which counts in
>>>>>> around 15MB more (2.4MB*6 deps) of extra weight that we can simply
>>>>>> reduce by using a shaded version of the library.
>>>>>>
>>>>>> I add this point to the previous ones mentioned during the discussion
>>>>>> because this goes against the end user experience and affects us all
>>>>>> (devs/users).
>>>>>>
>>>>>> Another question is if we should shade (and how) protocol buffers
>>>>>> because now with the portability work we are exposing it closer to the
>>>>>> end users. I say this because I also found an issue while running a
>>>>>> job on YARN with the spark runner because hadoop-common includes
>>>>>> protobuf-java 2 and I had to explicitly provide protocol-buffers 3 as
>>>>>> a dependency to be able to use triggers (note the Spark runner
>>>>>> translates them using some method from runners/core-java). Since
>>>>>> hadoop-common is provided in the cluster with the older version of
>>>>>> protobuf, I am afraid that this will bite us in the future.
>>>>>>
>>>>>> Ismaël
>>>>>>
>>>>>> ps. There is already a JIRA for that shading for protobuf on
>>>>>> hadoop-common but this is not coming until version 3 is out.
>>>>>> https://issues.apache.org/jira/browse/HADOOP-13136
>>>>>>
>>>>>> ps2. Extra curious situation is to see that the dataflow-runner ends
>>>>>> up having guava shaded twice via its shaded version on
>>>>>> core-construction-java.
>>>>>>
>>>>>> ps3. Of course this message means a de-facto +1 at least to do it for
>>>>>> guava and evaluate it for other libs.
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik <
>>>>>> lc...@google.com.invalid> wrote:
>>>>>>
>>>>>>> An issue to call out is how to deal with our generated code (.avro
>>>>>>> and
>>>>>>> .proto) as I don't believe those plugins allow you to generate code
>>>>>>> using a
>>>>>>> shaded package prefix on imports.
>>>>>>>
>>>>>>> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh <
>>>>>>> tg...@google.com.invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>> +1 to the goal. I'm hugely in favor of not doing the same shading
>>>>>>>> work
>>>>>>>> every time for dependencies we know we'll use.
>>>>>>>>
>>>>>>>> This also means that if we end up pulling in transitive
>>>>>>>> dependencies we
>>>>>>>> don't want in any particular module we can avoid having to adjust
>>>>>>>> our
>>>>>>>> repackaging strategy for that module - which I have run into
>>>>>>>> face-first in
>>>>>>>> the past.
>>>>>>>>
>>>>>>>> On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles <
>>>>>>>> k...@google.com.invalid>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Shading is a big part of how we keep our dependencies sane in
>>>>>>>>> Beam. But
>>>>>>>>> downsides: shading is super slow, causes massive jar bloat, and
>>>>>>>>> kind of
>>>>>>>>> hard to get right because artifacts and namespaces are not 1-to-1.
>>>>>>>>>
>>>>>>>>> I know that some communities distribute their own shaded
>>>>>>>>> distributions of
>>>>>>>>> dependencies. I had a thought about doing something similar that I
>>>>>>>>> wanted
>>>>>>>>> to throw out there for people to poke holes in.
>>>>>>>>>
>>>>>>>>> To set the scene, here is how I view shading:
>>>>>>>>>
>>>>>>>>>   - A module has public dependencies and private dependencies.
>>>>>>>>>   - Public deps are used for data interchange; users must share
>>>>>>>>> these
>>>>>>>>>
>>>>>>>> deps.
>>>>>>>>
>>>>>>>>>   - Private deps are just functionality and can be hidden (in our
>>>>>>>>> case,
>>>>>>>>> relocated + bundled)
>>>>>>>>>   - It isn't necessarily that simple, because public and private
>>>>>>>>> deps
>>>>>>>>>
>>>>>>>> might
>>>>>>>>
>>>>>>>>> interact in higher-order ways ("public" is contagious)
>>>>>>>>>
>>>>>>>>> Shading is an implementation detail of expressing these
>>>>>>>>> characteristics.
>>>>>>>>>
>>>>>>>> We
>>>>>>>>
>>>>>>>>> use shading selectively because of its downsides I mentioned above.
>>>>>>>>>
>>>>>>>>> But what about this idea: Introduce shaded deps as a single
>>>>>>>>> separate
>>>>>>>>> artifact.
>>>>>>>>>
>>>>>>>>>   - sdks/java/private-deps: bundled uber jar with relocated
>>>>>>>>> versions of
>>>>>>>>> everything we want to shade
>>>>>>>>>
>>>>>>>>>   - sdks/java/core and sdks/java/harness: no relocation or
>>>>>>>>> bundling -
>>>>>>>>> depends on `beam-sdks-java-private-deps` and imports like
>>>>>>>>> `org.apache.beam.sdk.private.com.google.common` directly (this is
>>>>>>>>> what
>>>>>>>>> they
>>>>>>>>> are rewritten to
>>>>>>>>>
>>>>>>>>> Some benefits
>>>>>>>>>
>>>>>>>>>   - much faster builds of other modules
>>>>>>>>>   - only one shaded uber jar
>>>>>>>>>   - rare/no rebuilds of the uber jar
>>>>>>>>>   - can use maven enforcer to forbid imports like com.google.common
>>>>>>>>>   - configuration all in one place
>>>>>>>>>   - no automated rewriting of our real code, which has led to some
>>>>>>>>> major
>>>>>>>>> confusion
>>>>>>>>>   - easy to implement incrementally
>>>>>>>>>
>>>>>>>>> Downsides:
>>>>>>>>>
>>>>>>>>>   - plenty of effort work to get there
>>>>>>>>>   - unclear how many different such deps modules we need; sharing
>>>>>>>>> them
>>>>>>>>>
>>>>>>>> could
>>>>>>>>
>>>>>>>>> get weird
>>>>>>>>>   - if we hit a roadblock, we will have committed a lot of time
>>>>>>>>>
>>>>>>>>> Just something I was musing as I spent another evening waiting for
>>>>>>>>> slow
>>>>>>>>> builds to try to confirm changes to brittle poms.
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofré
>>>>> jbono...@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Reply via email to