Another couple:

 - User-facing TypeDescriptor is a very thin wrapper on Guava's TypeToken
 - ImmutableList and friends and their builders are very widely used and
IMO still add a lot for readability and preventing someone coming along and
adding mistakes to a codebase

So considering it all, I would keep a vendored Guava (but also move off
where we can, and also have our own improvements). I hope it will be a
near-empty build file to generate it so not a maintenance burden.

Kenn

On Thu, Feb 1, 2018 at 8:44 PM, Kenneth Knowles <k...@google.com> wrote:

> Nice. This sounds like a great idea (or two?) and goes along with what I
> just started for futures.
>
> Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and assigned
> to Ismaël :-) and converted my futures thing to a subtask.
>
> Specific things for our micro guava:
>
>  - checkArgumentNotNull can throw the right exception
>  - our own Optional because Java's is not Serializable
>  - futures combinators since many are missing, especially Java's don't do
> exceptions right
>
> Protobuf: didn't file an issue because I'm not sure
>
> I was wondering if pre-shading works. We really need to get rid of it from
> public APIs in a 100% reliable way. It is also a problem for Dataflow. I
> was wondering if one approach is to pre-shade gcpio-protobuf-java,
> gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take
> a Message object. (and do the same for beam-model-protobuf-java since the
> model bits have to depend on each other but nothing else).
>
> Kenn
>
> On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía <ieme...@gmail.com> wrote:
>
>> Huge +1 to get rid of Guava!
>>
>> This solves annoying dependency issues for some IOs and allow us to
>> get rid of the shading that makes current jars bigger than they should
>> be.
>>
>> We can create our own 'micro guava' package with some classes for
>> things that are hard to migrate, or that we  prefer to still have like
>> the check* methods for example. Given the size of the task we should
>> probably divide it into subtasks, more important is to get rid of it
>> for 'sdks/java/core'. We can then attack other areas afterwards.
>>
>> Other important idea would be to get rid of Protobuf in public APIs
>> like GCPIO and to better shade it from leaking into the runners. An
>> unexpected side effect of this is a leak of netty via gRPC/protobuf
>> that is byting us for the Spark runner, but well that's worth a
>> different discussion.
>>
>>
>> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
>> <rmannibu...@gmail.com> wrote:
>> > a map of list is fine and not a challenge we'll face long I hope ;)
>> >
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> >
>> > 2018-02-01 9:40 GMT+01:00 Reuven Lax <re...@google.com>:
>> >>
>> >> Not sure we'll be able to replace them all. Things like guava Table and
>> >> Multimap don't have great replacements in Java8.
>> >>
>> >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <
>> j...@nanthrax.net>
>> >> wrote:
>> >>>
>> >>> +1, it was on my TODO for a while waiting the Java8 update.
>> >>>
>> >>> Regards
>> >>> JB
>> >>>
>> >>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
>> >>> > Why not dropping guava for all beam codebase? With java 8 it is
>> quite
>> >>> > easy to do
>> >>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a
>> >>> > good result.
>> >>> >
>> >>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com
>> >>> > <mailto:lc...@google.com>> a écrit :
>> >>> >
>> >>> >     Make sure to include the guava version in the artifact name so
>> that
>> >>> > we can
>> >>> >     have multiple vendored versions.
>> >>> >
>> >>> >     On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <
>> k...@google.com
>> >>> >     <mailto:k...@google.com>> wrote:
>> >>> >
>> >>> >         I didn't have time for this, but it just bit me. We
>> definitely
>> >>> > have
>> >>> >         Guava on the API surface of runner support code in ways that
>> >>> > get
>> >>> >         incompatibly shaded. I will probably start "1a" by making a
>> >>> > shaded
>> >>> >         library org.apache.beam:vendored-guava and starting to use
>> it.
>> >>> > It sounds
>> >>> >         like there is generally unanimous support for that much,
>> >>> > anyhow.
>> >>> >
>> >>> >         Kenn
>> >>> >
>> >>> >         On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek
>> >>> > <aljos...@apache.org
>> >>> >         <mailto:aljos...@apache.org>> wrote:
>> >>> >
>> >>> >             Thanks Ismaël for bringing up this discussion again!
>> >>> >
>> >>> >             I would be in favour of 1) and more specifically of 1a)
>> >>> >
>> >>> >             Aljoscha
>> >>> >
>> >>> >
>> >>> >>             On 12. Dec 2017, at 18:56, Lukasz Cwik <
>> lc...@google.com
>> >>> >>             <mailto:lc...@google.com>> wrote:
>> >>> >>
>> >>> >>             You can always run tests on post shaded artifacts
>> instead
>> >>> >> of the
>> >>> >>             compiled classes, it just requires us to change our
>> maven
>> >>> >> surefire
>> >>> >>             / gradle test configurations but it is true that most
>> IDEs
>> >>> >> would
>> >>> >>             behave better with a dependency jar unless you delegate
>> >>> >> all the
>> >>> >>             build/test actions to the build system and then it
>> won't
>> >>> >> matter.
>> >>> >>
>> >>> >>             On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles
>> >>> >> <k...@google.com
>> >>> >>             <mailto:k...@google.com>> wrote:
>> >>> >>
>> >>> >>                 There's also, with additional overhead,
>> >>> >>
>> >>> >>                 1a) A relocated and shipped package for each thing
>> we
>> >>> >> want to
>> >>> >>                 relocate. I think this has also been tried outside
>> >>> >> Beam...
>> >>> >>
>> >>> >>                 Pros:
>> >>> >>                 * all the pros of 1) plus no bloat beyond what is
>> >>> >> necessary
>> >>> >>                 Cons:
>> >>> >>                 * abandons whitelist approach for public deps,
>> >>> >> reverting to
>> >>> >>                 blacklist approach for trouble things like guava,
>> so a
>> >>> >> bit
>> >>> >>                 less principled
>> >>> >>
>> >>> >>                 For both 1) and 1a) I would add:
>> >>> >>
>> >>> >>                 Pros:
>> >>> >>                 * clearly readable dependency since code will
>> `import
>> >>> >>                 org.apache.beam.private.guava21` and IDEs will
>> >>> >> understand it
>> >>> >>                 is a distinct lilbrary
>> >>> >>                 * can run tests on unpackaged classes, as long as
>> deps
>> >>> >> are
>> >>> >>                 shaded or provided as jars
>> >>> >>                 * no mysterious action at a distance from inherited
>> >>> >> configuration
>> >>> >>                 Cons:
>> >>> >>                 * need to adjust imports
>> >>> >>
>> >>> >>                 Kenn
>> >>> >>
>> >>> >>                 On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik
>> >>> >> <lc...@google.com
>> >>> >>                 <mailto:lc...@google.com>> wrote:
>> >>> >>
>> >>> >>                     I would suggest that either we use:
>> >>> >>                     1) A common deps package containing shaded
>> >>> >> dependencies
>> >>> >>                     allows for
>> >>> >>                     Pros
>> >>> >>                     * doesn't require the user to build an uber jar
>> >>> >>                     Risks
>> >>> >>                     * dependencies package will keep growing even
>> if
>> >>> >> something
>> >>> >>                     is or isn't needed by all of Apache Beam
>> leading
>> >>> >> to a
>> >>> >>                     large jar anyways negating any space savings
>> >>> >>
>> >>> >>                     2) Shade within each module to a common
>> location
>> >>> >> like
>> >>> >>                     org.apache.beam.relocated.guava....
>> >>> >>                     Pros
>> >>> >>                     * you only get the shaded dependencies of the
>> >>> >> things that
>> >>> >>                     are required
>> >>> >>                     * its one less dependency for users to manage
>> >>> >>                     Risks
>> >>> >>                     * requires an uber jar to be built to get the
>> >>> >> space
>> >>> >>                     savings (either by a user or a distribution of
>> >>> >> Apache
>> >>> >>                     Beam) otherwise we negate any space savings.
>> >>> >>
>> >>> >>                     If we either use a common relocation scheme or
>> a
>> >>> >>                     dependencies jar then each relocation should
>> >>> >> specifically
>> >>> >>                     contain the version number of the package
>> because
>> >>> >> we would
>> >>> >>                     like to allow for us to be using two different
>> >>> >> versions of
>> >>> >>                     the same library.
>> >>> >>
>> >>> >>                     For the common deps package approach, should we
>> >>> >> check in
>> >>> >>                     code where the imports contain the relocated
>> >>> >> location
>> >>> >>                     (e.g. import org.apache.beam.guava.20.0.com
>> >>> >>
>> >>> >> <http://org.apache.beam.guava.20.0.com/>.google.common.colle
>> ct.ImmutableList)?
>> >>> >>
>> >>> >>
>> >>> >>                     On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste
>> >>> >> Onofré
>> >>> >>                     <j...@nanthrax.net <mailto:j...@nanthrax.net>>
>> wrote:
>> >>> >>
>> >>> >>                         Thanks for bringing that back.
>> >>> >>
>> >>> >>                         Indeed guava is shaded in different
>> uber-jar.
>> >>> >> Maybe we
>> >>> >>                         can have a common deps module that we
>> include
>> >>> >> once
>> >>> >>                         (but the user will have to explicitly
>> define
>> >>> >> the dep) ?
>> >>> >>
>> >>> >>                         Basically, what do you propose for protobuf
>> >>> >>                         (unfortunately, I don't see an obvious) ?
>> >>> >>
>> >>> >>                         Regards
>> >>> >>                         JB
>> >>> >>
>> >>> >>
>> >>> >>                         On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
>> >>> >>
>> >>> >>                             Hello, I wanted to bring back this
>> subject
>> >>> >> because
>> >>> >>                             I think we should
>> >>> >>                             take action on this and at least first
>> >>> >> have a
>> >>> >>                             shaded version of guava.
>> >>> >>                             I was playing with a toy project and I
>> did
>> >>> >> the
>> >>> >>                             procedure we use to
>> >>> >>                             submit jars to a Hadoop cluster via
>> >>> >> Flink/Spark
>> >>> >>                             which involves
>> >>> >>                             creating an uber jar and I realized
>> that
>> >>> >> the size
>> >>> >>                             of the jar was way
>> >>> >>                             bigger than I expected, and the fact
>> that
>> >>> >> we shade
>> >>> >>                             guava in every
>> >>> >>                             module contributes to this. I found
>> guava
>> >>> >> shaded on:
>> >>> >>
>> >>> >>                             sdks/java/core
>> >>> >>                             runners/core-construction-java
>> >>> >>                             runners/core-java
>> >>> >>                             model/job-management
>> >>> >>                             runners/spark
>> >>> >>                             sdks/java/io/hadoop-file-system
>> >>> >>                             sdks/java/io/kafka
>> >>> >>
>> >>> >>                             This means at least 6 times more of the
>> >>> >> size it
>> >>> >>                             should which counts in
>> >>> >>                             around 15MB more (2.4MB*6 deps) of
>> extra
>> >>> >> weight
>> >>> >>                             that we can simply
>> >>> >>                             reduce by using a shaded version of the
>> >>> >> library.
>> >>> >>
>> >>> >>                             I add this point to the previous ones
>> >>> >> mentioned
>> >>> >>                             during the discussion
>> >>> >>                             because this goes against the end user
>> >>> >> experience
>> >>> >>                             and affects us all
>> >>> >>                             (devs/users).
>> >>> >>
>> >>> >>                             Another question is if we should shade
>> >>> >> (and how)
>> >>> >>                             protocol buffers
>> >>> >>                             because now with the portability work
>> we
>> >>> >> are
>> >>> >>                             exposing it closer to the
>> >>> >>                             end users. I say this because I also
>> found
>> >>> >> an
>> >>> >>                             issue while running a
>> >>> >>                             job on YARN with the spark runner
>> because
>> >>> >>                             hadoop-common includes
>> >>> >>                             protobuf-java 2 and I had to explicitly
>> >>> >> provide
>> >>> >>                             protocol-buffers 3 as
>> >>> >>                             a dependency to be able to use triggers
>> >>> >> (note the
>> >>> >>                             Spark runner
>> >>> >>                             translates them using some method from
>> >>> >>                             runners/core-java). Since
>> >>> >>                             hadoop-common is provided in the
>> cluster
>> >>> >> with the
>> >>> >>                             older version of
>> >>> >>                             protobuf, I am afraid that this will
>> bite
>> >>> >> us in
>> >>> >>                             the future.
>> >>> >>
>> >>> >>                             Ismaël
>> >>> >>
>> >>> >>                             ps. There is already a JIRA for that
>> >>> >> shading for
>> >>> >>                             protobuf on
>> >>> >>                             hadoop-common but this is not coming
>> until
>> >>> >> version
>> >>> >>                             3 is out.
>> >>> >>
>> >>> >> https://issues.apache.org/jira/browse/HADOOP-13136
>> >>> >>
>> >>> >> <https://issues.apache.org/jira/browse/HADOOP-13136>
>> >>> >>
>> >>> >>                             ps2. Extra curious situation is to see
>> >>> >> that the
>> >>> >>                             dataflow-runner ends
>> >>> >>                             up having guava shaded twice via its
>> >>> >> shaded version on
>> >>> >>                             core-construction-java.
>> >>> >>
>> >>> >>                             ps3. Of course this message means a
>> >>> >> de-facto +1 at
>> >>> >>                             least to do it for
>> >>> >>                             guava and evaluate it for other libs.
>> >>> >>
>> >>> >>
>> >>> >>                             On Tue, Oct 17, 2017 at 7:29 PM, Lukasz
>> >>> >> Cwik
>> >>> >>                             <lc...@google.com.invalid
>> >>> >>                             <mailto:lc...@google.com.invalid>>
>> wrote:
>> >>> >>
>> >>> >>                                 An issue to call out is how to deal
>> >>> >> with our
>> >>> >>                                 generated code (.avro and
>> >>> >>                                 .proto) as I don't believe those
>> >>> >> plugins allow
>> >>> >>                                 you to generate code using a
>> >>> >>                                 shaded package prefix on imports.
>> >>> >>
>> >>> >>                                 On Tue, Oct 17, 2017 at 10:28 AM,
>> >>> >> Thomas Groh
>> >>> >>                                 <tg...@google.com.invalid
>> >>> >>                                 <mailto:tg...@google.com.invalid>>
>> >>> >>                                 wrote:
>> >>> >>
>> >>> >>                                     +1 to the goal. I'm hugely in
>> >>> >> favor of not
>> >>> >>                                     doing the same shading work
>> >>> >>                                     every time for dependencies we
>> >>> >> know we'll use.
>> >>> >>
>> >>> >>                                     This also means that if we end
>> up
>> >>> >> pulling
>> >>> >>                                     in transitive dependencies we
>> >>> >>                                     don't want in any particular
>> >>> >> module we can
>> >>> >>                                     avoid having to adjust our
>> >>> >>                                     repackaging strategy for that
>> >>> >> module -
>> >>> >>                                     which I have run into
>> face-first
>> >>> >> in
>> >>> >>                                     the past.
>> >>> >>
>> >>> >>                                     On Tue, Oct 17, 2017 at 9:48
>> AM,
>> >>> >> Kenneth
>> >>> >>                                     Knowles <k...@google.com.invalid
>> >>> >>                                     <mailto:k...@google.com.invalid
>> >>
>> >>> >>                                     wrote:
>> >>> >>
>> >>> >>                                         Hi all,
>> >>> >>
>> >>> >>                                         Shading is a big part of
>> how
>> >>> >> we keep
>> >>> >>                                         our dependencies sane in
>> Beam.
>> >>> >> But
>> >>> >>                                         downsides: shading is super
>> >>> >> slow,
>> >>> >>                                         causes massive jar bloat,
>> and
>> >>> >> kind of
>> >>> >>                                         hard to get right because
>> >>> >> artifacts
>> >>> >>                                         and namespaces are not
>> 1-to-1.
>> >>> >>
>> >>> >>                                         I know that some
>> communities
>> >>> >>                                         distribute their own shaded
>> >>> >>                                         distributions of
>> >>> >>                                         dependencies. I had a
>> thought
>> >>> >> about
>> >>> >>                                         doing something similar
>> that I
>> >>> >> wanted
>> >>> >>                                         to throw out there for
>> people
>> >>> >> to poke
>> >>> >>                                         holes in.
>> >>> >>
>> >>> >>                                         To set the scene, here is
>> how
>> >>> >> I view
>> >>> >>                                         shading:
>> >>> >>
>> >>> >>                                           - A module has public
>> >>> >> dependencies
>> >>> >>                                         and private dependencies.
>> >>> >>                                           - Public deps are used
>> for
>> >>> >> data
>> >>> >>                                         interchange; users must
>> share
>> >>> >> these
>> >>> >>
>> >>> >>                                     deps.
>> >>> >>
>> >>> >>                                           - Private deps are just
>> >>> >>                                         functionality and can be
>> >>> >> hidden (in
>> >>> >>                                         our case,
>> >>> >>                                         relocated + bundled)
>> >>> >>                                           - It isn't necessarily
>> that
>> >>> >> simple,
>> >>> >>                                         because public and private
>> >>> >> deps
>> >>> >>
>> >>> >>                                     might
>> >>> >>
>> >>> >>                                         interact in higher-order
>> ways
>> >>> >>                                         ("public" is contagious)
>> >>> >>
>> >>> >>                                         Shading is an
>> implementation
>> >>> >> detail of
>> >>> >>                                         expressing these
>> >>> >> characteristics.
>> >>> >>
>> >>> >>                                     We
>> >>> >>
>> >>> >>                                         use shading selectively
>> >>> >> because of its
>> >>> >>                                         downsides I mentioned
>> above.
>> >>> >>
>> >>> >>                                         But what about this idea:
>> >>> >> Introduce
>> >>> >>                                         shaded deps as a single
>> >>> >> separate
>> >>> >>                                         artifact.
>> >>> >>
>> >>> >>                                           - sdks/java/private-deps:
>> >>> >> bundled
>> >>> >>                                         uber jar with relocated
>> >>> >> versions of
>> >>> >>                                         everything we want to shade
>> >>> >>
>> >>> >>                                           - sdks/java/core and
>> >>> >>                                         sdks/java/harness: no
>> >>> >> relocation or
>> >>> >>                                         bundling -
>> >>> >>                                         depends on
>> >>> >>
>>  `beam-sdks-java-private-deps`
>> >>> >> and
>> >>> >>                                         imports like
>> >>> >>
>> >>> >> `org.apache.beam.sdk.private.com.google.common`
>> >>> >>                                         directly (this is what
>> >>> >>                                         they
>> >>> >>                                         are rewritten to
>> >>> >>
>> >>> >>                                         Some benefits
>> >>> >>
>> >>> >>                                           - much faster builds of
>> >>> >> other modules
>> >>> >>                                           - only one shaded uber
>> jar
>> >>> >>                                           - rare/no rebuilds of the
>> >>> >> uber jar
>> >>> >>                                           - can use maven enforcer
>> to
>> >>> >> forbid
>> >>> >>                                         imports like
>> com.google.common
>> >>> >>                                           - configuration all in
>> one
>> >>> >> place
>> >>> >>                                           - no automated rewriting
>> of
>> >>> >> our real
>> >>> >>                                         code, which has led to some
>> >>> >> major
>> >>> >>                                         confusion
>> >>> >>                                           - easy to implement
>> >>> >> incrementally
>> >>> >>
>> >>> >>                                         Downsides:
>> >>> >>
>> >>> >>                                           - plenty of effort work
>> to
>> >>> >> get there
>> >>> >>                                           - unclear how many
>> different
>> >>> >> such
>> >>> >>                                         deps modules we need;
>> sharing
>> >>> >> them
>> >>> >>
>> >>> >>                                     could
>> >>> >>
>> >>> >>                                         get weird
>> >>> >>                                           - if we hit a roadblock,
>> we
>> >>> >> will
>> >>> >>                                         have committed a lot of
>> time
>> >>> >>
>> >>> >>                                         Just something I was
>> musing as
>> >>> >> I spent
>> >>> >>                                         another evening waiting for
>> >>> >> slow
>> >>> >>                                         builds to try to confirm
>> >>> >> changes to
>> >>> >>                                         brittle poms.
>> >>> >>
>> >>> >>                                         Kenn
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>                         --
>> >>> >>                         Jean-Baptiste Onofré
>> >>> >>                         jbono...@apache.org
>> >>> >> <mailto:jbono...@apache.org>
>> >>> >>                         http://blog.nanthrax.net
>> >>> >> <http://blog.nanthrax.net/>
>> >>> >>                         Talend - http://www.talend.com
>> >>> >> <http://www.talend.com/>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>> --
>> >>> Jean-Baptiste Onofré
>> >>> jbono...@apache.org
>> >>> http://blog.nanthrax.net
>> >>> Talend - http://www.talend.com
>> >>
>> >>
>> >
>>
>
>

Reply via email to