Make sure to include the guava version in the artifact name so that we can have multiple vendored versions.
On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com> wrote: > I didn't have time for this, but it just bit me. We definitely have Guava > on the API surface of runner support code in ways that get incompatibly > shaded. I will probably start "1a" by making a shaded library > org.apache.beam:vendored-guava and starting to use it. It sounds like there > is generally unanimous support for that much, anyhow. > > Kenn > > On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek <aljos...@apache.org> > wrote: > >> Thanks Ismaël for bringing up this discussion again! >> >> I would be in favour of 1) and more specifically of 1a) >> >> Aljoscha >> >> >> On 12. Dec 2017, at 18:56, Lukasz Cwik <lc...@google.com> wrote: >> >> You can always run tests on post shaded artifacts instead of the compiled >> classes, it just requires us to change our maven surefire / gradle test >> configurations but it is true that most IDEs would behave better with a >> dependency jar unless you delegate all the build/test actions to the build >> system and then it won't matter. >> >> On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles <k...@google.com> wrote: >> >>> There's also, with additional overhead, >>> >>> 1a) A relocated and shipped package for each thing we want to relocate. >>> I think this has also been tried outside Beam... >>> >>> Pros: >>> * all the pros of 1) plus no bloat beyond what is necessary >>> Cons: >>> * abandons whitelist approach for public deps, reverting to blacklist >>> approach for trouble things like guava, so a bit less principled >>> >>> For both 1) and 1a) I would add: >>> >>> Pros: >>> * clearly readable dependency since code will `import >>> org.apache.beam.private.guava21` and IDEs will understand it is a >>> distinct lilbrary >>> * can run tests on unpackaged classes, as long as deps are shaded or >>> provided as jars >>> * no mysterious action at a distance from inherited configuration >>> Cons: >>> * need to adjust imports >>> >>> Kenn >>> >>> On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik <lc...@google.com> wrote: >>> >>>> I would suggest that either we use: >>>> 1) A common deps package containing shaded dependencies allows for >>>> Pros >>>> * doesn't require the user to build an uber jar >>>> Risks >>>> * dependencies package will keep growing even if something is or isn't >>>> needed by all of Apache Beam leading to a large jar anyways negating any >>>> space savings >>>> >>>> 2) Shade within each module to a common location like >>>> org.apache.beam.relocated.guava.... >>>> Pros >>>> * you only get the shaded dependencies of the things that are required >>>> * its one less dependency for users to manage >>>> Risks >>>> * requires an uber jar to be built to get the space savings (either by >>>> a user or a distribution of Apache Beam) otherwise we negate any space >>>> savings. >>>> >>>> If we either use a common relocation scheme or a dependencies jar then >>>> each relocation should specifically contain the version number of the >>>> package because we would like to allow for us to be using two different >>>> versions of the same library. >>>> >>>> For the common deps package approach, should we check in code where the >>>> imports contain the relocated location (e.g. import >>>> org.apache.beam.guava.20.0.com.google.common.collect.ImmutableList)? >>>> >>>> >>>> On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste Onofré <j...@nanthrax.net> >>>> wrote: >>>> >>>>> Thanks for bringing that back. >>>>> >>>>> Indeed guava is shaded in different uber-jar. Maybe we can have a >>>>> common deps module that we include once (but the user will have to >>>>> explicitly define the dep) ? >>>>> >>>>> Basically, what do you propose for protobuf (unfortunately, I don't >>>>> see an obvious) ? >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> >>>>> On 12/11/2017 05:35 PM, Ismaël Mejía wrote: >>>>> >>>>>> Hello, I wanted to bring back this subject because I think we should >>>>>> take action on this and at least first have a shaded version of guava. >>>>>> I was playing with a toy project and I did the procedure we use to >>>>>> submit jars to a Hadoop cluster via Flink/Spark which involves >>>>>> creating an uber jar and I realized that the size of the jar was way >>>>>> bigger than I expected, and the fact that we shade guava in every >>>>>> module contributes to this. I found guava shaded on: >>>>>> >>>>>> sdks/java/core >>>>>> runners/core-construction-java >>>>>> runners/core-java >>>>>> model/job-management >>>>>> runners/spark >>>>>> sdks/java/io/hadoop-file-system >>>>>> sdks/java/io/kafka >>>>>> >>>>>> This means at least 6 times more of the size it should which counts in >>>>>> around 15MB more (2.4MB*6 deps) of extra weight that we can simply >>>>>> reduce by using a shaded version of the library. >>>>>> >>>>>> I add this point to the previous ones mentioned during the discussion >>>>>> because this goes against the end user experience and affects us all >>>>>> (devs/users). >>>>>> >>>>>> Another question is if we should shade (and how) protocol buffers >>>>>> because now with the portability work we are exposing it closer to the >>>>>> end users. I say this because I also found an issue while running a >>>>>> job on YARN with the spark runner because hadoop-common includes >>>>>> protobuf-java 2 and I had to explicitly provide protocol-buffers 3 as >>>>>> a dependency to be able to use triggers (note the Spark runner >>>>>> translates them using some method from runners/core-java). Since >>>>>> hadoop-common is provided in the cluster with the older version of >>>>>> protobuf, I am afraid that this will bite us in the future. >>>>>> >>>>>> Ismaël >>>>>> >>>>>> ps. There is already a JIRA for that shading for protobuf on >>>>>> hadoop-common but this is not coming until version 3 is out. >>>>>> https://issues.apache.org/jira/browse/HADOOP-13136 >>>>>> >>>>>> ps2. Extra curious situation is to see that the dataflow-runner ends >>>>>> up having guava shaded twice via its shaded version on >>>>>> core-construction-java. >>>>>> >>>>>> ps3. Of course this message means a de-facto +1 at least to do it for >>>>>> guava and evaluate it for other libs. >>>>>> >>>>>> >>>>>> On Tue, Oct 17, 2017 at 7:29 PM, Lukasz Cwik < >>>>>> lc...@google.com.invalid> wrote: >>>>>> >>>>>>> An issue to call out is how to deal with our generated code (.avro >>>>>>> and >>>>>>> .proto) as I don't believe those plugins allow you to generate code >>>>>>> using a >>>>>>> shaded package prefix on imports. >>>>>>> >>>>>>> On Tue, Oct 17, 2017 at 10:28 AM, Thomas Groh < >>>>>>> tg...@google.com.invalid> >>>>>>> wrote: >>>>>>> >>>>>>> +1 to the goal. I'm hugely in favor of not doing the same shading >>>>>>>> work >>>>>>>> every time for dependencies we know we'll use. >>>>>>>> >>>>>>>> This also means that if we end up pulling in transitive >>>>>>>> dependencies we >>>>>>>> don't want in any particular module we can avoid having to adjust >>>>>>>> our >>>>>>>> repackaging strategy for that module - which I have run into >>>>>>>> face-first in >>>>>>>> the past. >>>>>>>> >>>>>>>> On Tue, Oct 17, 2017 at 9:48 AM, Kenneth Knowles < >>>>>>>> k...@google.com.invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> Shading is a big part of how we keep our dependencies sane in >>>>>>>>> Beam. But >>>>>>>>> downsides: shading is super slow, causes massive jar bloat, and >>>>>>>>> kind of >>>>>>>>> hard to get right because artifacts and namespaces are not 1-to-1. >>>>>>>>> >>>>>>>>> I know that some communities distribute their own shaded >>>>>>>>> distributions of >>>>>>>>> dependencies. I had a thought about doing something similar that I >>>>>>>>> wanted >>>>>>>>> to throw out there for people to poke holes in. >>>>>>>>> >>>>>>>>> To set the scene, here is how I view shading: >>>>>>>>> >>>>>>>>> - A module has public dependencies and private dependencies. >>>>>>>>> - Public deps are used for data interchange; users must share >>>>>>>>> these >>>>>>>>> >>>>>>>> deps. >>>>>>>> >>>>>>>>> - Private deps are just functionality and can be hidden (in our >>>>>>>>> case, >>>>>>>>> relocated + bundled) >>>>>>>>> - It isn't necessarily that simple, because public and private >>>>>>>>> deps >>>>>>>>> >>>>>>>> might >>>>>>>> >>>>>>>>> interact in higher-order ways ("public" is contagious) >>>>>>>>> >>>>>>>>> Shading is an implementation detail of expressing these >>>>>>>>> characteristics. >>>>>>>>> >>>>>>>> We >>>>>>>> >>>>>>>>> use shading selectively because of its downsides I mentioned above. >>>>>>>>> >>>>>>>>> But what about this idea: Introduce shaded deps as a single >>>>>>>>> separate >>>>>>>>> artifact. >>>>>>>>> >>>>>>>>> - sdks/java/private-deps: bundled uber jar with relocated >>>>>>>>> versions of >>>>>>>>> everything we want to shade >>>>>>>>> >>>>>>>>> - sdks/java/core and sdks/java/harness: no relocation or >>>>>>>>> bundling - >>>>>>>>> depends on `beam-sdks-java-private-deps` and imports like >>>>>>>>> `org.apache.beam.sdk.private.com.google.common` directly (this is >>>>>>>>> what >>>>>>>>> they >>>>>>>>> are rewritten to >>>>>>>>> >>>>>>>>> Some benefits >>>>>>>>> >>>>>>>>> - much faster builds of other modules >>>>>>>>> - only one shaded uber jar >>>>>>>>> - rare/no rebuilds of the uber jar >>>>>>>>> - can use maven enforcer to forbid imports like com.google.common >>>>>>>>> - configuration all in one place >>>>>>>>> - no automated rewriting of our real code, which has led to some >>>>>>>>> major >>>>>>>>> confusion >>>>>>>>> - easy to implement incrementally >>>>>>>>> >>>>>>>>> Downsides: >>>>>>>>> >>>>>>>>> - plenty of effort work to get there >>>>>>>>> - unclear how many different such deps modules we need; sharing >>>>>>>>> them >>>>>>>>> >>>>>>>> could >>>>>>>> >>>>>>>>> get weird >>>>>>>>> - if we hit a roadblock, we will have committed a lot of time >>>>>>>>> >>>>>>>>> Just something I was musing as I spent another evening waiting for >>>>>>>>> slow >>>>>>>>> builds to try to confirm changes to brittle poms. >>>>>>>>> >>>>>>>>> Kenn >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> -- >>>>> Jean-Baptiste Onofré >>>>> jbono...@apache.org >>>>> http://blog.nanthrax.net >>>>> Talend - http://www.talend.com >>>>> >>>> >>>> >>> >> >> >