Nice. This sounds like a great idea (or two?) and goes along with what I
just started for futures.

Guava: filed https://issues.apache.org/jira/browse/BEAM-3606 and assigned
to Ismaël :-) and converted my futures thing to a subtask.

Specific things for our micro guava:

 - checkArgumentNotNull can throw the right exception
 - our own Optional because Java's is not Serializable
 - futures combinators since many are missing, especially Java's don't do
exceptions right

Protobuf: didn't file an issue because I'm not sure

I was wondering if pre-shading works. We really need to get rid of it from
public APIs in a 100% reliable way. It is also a problem for Dataflow. I
was wondering if one approach is to pre-shade gcpio-protobuf-java,
gcpio-protobuf-java-util, gcpio-grpc-java, etc. Anything that needs to take
a Message object. (and do the same for beam-model-protobuf-java since the
model bits have to depend on each other but nothing else).

Kenn

On Thu, Feb 1, 2018 at 1:56 AM, Ismaël Mejía <ieme...@gmail.com> wrote:

> Huge +1 to get rid of Guava!
>
> This solves annoying dependency issues for some IOs and allow us to
> get rid of the shading that makes current jars bigger than they should
> be.
>
> We can create our own 'micro guava' package with some classes for
> things that are hard to migrate, or that we  prefer to still have like
> the check* methods for example. Given the size of the task we should
> probably divide it into subtasks, more important is to get rid of it
> for 'sdks/java/core'. We can then attack other areas afterwards.
>
> Other important idea would be to get rid of Protobuf in public APIs
> like GCPIO and to better shade it from leaking into the runners. An
> unexpected side effect of this is a leak of netty via gRPC/protobuf
> that is byting us for the Spark runner, but well that's worth a
> different discussion.
>
>
> On Thu, Feb 1, 2018 at 10:08 AM, Romain Manni-Bucau
> <rmannibu...@gmail.com> wrote:
> > a map of list is fine and not a challenge we'll face long I hope ;)
> >
> >
> > Romain Manni-Bucau
> > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >
> > 2018-02-01 9:40 GMT+01:00 Reuven Lax <re...@google.com>:
> >>
> >> Not sure we'll be able to replace them all. Things like guava Table and
> >> Multimap don't have great replacements in Java8.
> >>
> >> On Wed, Jan 31, 2018 at 10:11 PM, Jean-Baptiste Onofré <j...@nanthrax.net
> >
> >> wrote:
> >>>
> >>> +1, it was on my TODO for a while waiting the Java8 update.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 02/01/2018 06:56 AM, Romain Manni-Bucau wrote:
> >>> > Why not dropping guava for all beam codebase? With java 8 it is quite
> >>> > easy to do
> >>> > it and avoid a bunch of conflicts. Did it in 2 projects with quite a
> >>> > good result.
> >>> >
> >>> > Le 1 févr. 2018 06:50, "Lukasz Cwik" <lc...@google.com
> >>> > <mailto:lc...@google.com>> a écrit :
> >>> >
> >>> >     Make sure to include the guava version in the artifact name so
> that
> >>> > we can
> >>> >     have multiple vendored versions.
> >>> >
> >>> >     On Wed, Jan 31, 2018 at 9:16 PM, Kenneth Knowles <k...@google.com
> >>> >     <mailto:k...@google.com>> wrote:
> >>> >
> >>> >         I didn't have time for this, but it just bit me. We
> definitely
> >>> > have
> >>> >         Guava on the API surface of runner support code in ways that
> >>> > get
> >>> >         incompatibly shaded. I will probably start "1a" by making a
> >>> > shaded
> >>> >         library org.apache.beam:vendored-guava and starting to use
> it.
> >>> > It sounds
> >>> >         like there is generally unanimous support for that much,
> >>> > anyhow.
> >>> >
> >>> >         Kenn
> >>> >
> >>> >         On Wed, Dec 13, 2017 at 8:31 AM, Aljoscha Krettek
> >>> > <aljos...@apache.org
> >>> >         <mailto:aljos...@apache.org>> wrote:
> >>> >
> >>> >             Thanks Ismaël for bringing up this discussion again!
> >>> >
> >>> >             I would be in favour of 1) and more specifically of 1a)
> >>> >
> >>> >             Aljoscha
> >>> >
> >>> >
> >>> >>             On 12. Dec 2017, at 18:56, Lukasz Cwik <
> lc...@google.com
> >>> >>             <mailto:lc...@google.com>> wrote:
> >>> >>
> >>> >>             You can always run tests on post shaded artifacts
> instead
> >>> >> of the
> >>> >>             compiled classes, it just requires us to change our
> maven
> >>> >> surefire
> >>> >>             / gradle test configurations but it is true that most
> IDEs
> >>> >> would
> >>> >>             behave better with a dependency jar unless you delegate
> >>> >> all the
> >>> >>             build/test actions to the build system and then it won't
> >>> >> matter.
> >>> >>
> >>> >>             On Mon, Dec 11, 2017 at 9:05 PM, Kenneth Knowles
> >>> >> <k...@google.com
> >>> >>             <mailto:k...@google.com>> wrote:
> >>> >>
> >>> >>                 There's also, with additional overhead,
> >>> >>
> >>> >>                 1a) A relocated and shipped package for each thing
> we
> >>> >> want to
> >>> >>                 relocate. I think this has also been tried outside
> >>> >> Beam...
> >>> >>
> >>> >>                 Pros:
> >>> >>                 * all the pros of 1) plus no bloat beyond what is
> >>> >> necessary
> >>> >>                 Cons:
> >>> >>                 * abandons whitelist approach for public deps,
> >>> >> reverting to
> >>> >>                 blacklist approach for trouble things like guava,
> so a
> >>> >> bit
> >>> >>                 less principled
> >>> >>
> >>> >>                 For both 1) and 1a) I would add:
> >>> >>
> >>> >>                 Pros:
> >>> >>                 * clearly readable dependency since code will
> `import
> >>> >>                 org.apache.beam.private.guava21` and IDEs will
> >>> >> understand it
> >>> >>                 is a distinct lilbrary
> >>> >>                 * can run tests on unpackaged classes, as long as
> deps
> >>> >> are
> >>> >>                 shaded or provided as jars
> >>> >>                 * no mysterious action at a distance from inherited
> >>> >> configuration
> >>> >>                 Cons:
> >>> >>                 * need to adjust imports
> >>> >>
> >>> >>                 Kenn
> >>> >>
> >>> >>                 On Mon, Dec 11, 2017 at 9:57 AM, Lukasz Cwik
> >>> >> <lc...@google.com
> >>> >>                 <mailto:lc...@google.com>> wrote:
> >>> >>
> >>> >>                     I would suggest that either we use:
> >>> >>                     1) A common deps package containing shaded
> >>> >> dependencies
> >>> >>                     allows for
> >>> >>                     Pros
> >>> >>                     * doesn't require the user to build an uber jar
> >>> >>                     Risks
> >>> >>                     * dependencies package will keep growing even if
> >>> >> something
> >>> >>                     is or isn't needed by all of Apache Beam leading
> >>> >> to a
> >>> >>                     large jar anyways negating any space savings
> >>> >>
> >>> >>                     2) Shade within each module to a common location
> >>> >> like
> >>> >>                     org.apache.beam.relocated.guava....
> >>> >>                     Pros
> >>> >>                     * you only get the shaded dependencies of the
> >>> >> things that
> >>> >>                     are required
> >>> >>                     * its one less dependency for users to manage
> >>> >>                     Risks
> >>> >>                     * requires an uber jar to be built to get the
> >>> >> space
> >>> >>                     savings (either by a user or a distribution of
> >>> >> Apache
> >>> >>                     Beam) otherwise we negate any space savings.
> >>> >>
> >>> >>                     If we either use a common relocation scheme or a
> >>> >>                     dependencies jar then each relocation should
> >>> >> specifically
> >>> >>                     contain the version number of the package
> because
> >>> >> we would
> >>> >>                     like to allow for us to be using two different
> >>> >> versions of
> >>> >>                     the same library.
> >>> >>
> >>> >>                     For the common deps package approach, should we
> >>> >> check in
> >>> >>                     code where the imports contain the relocated
> >>> >> location
> >>> >>                     (e.g. import org.apache.beam.guava.20.0.com
> >>> >>
> >>> >> <http://org.apache.beam.guava.20.0.com/>.google.common.
> collect.ImmutableList)?
> >>> >>
> >>> >>
> >>> >>                     On Mon, Dec 11, 2017 at 8:47 AM, Jean-Baptiste
> >>> >> Onofré
> >>> >>                     <j...@nanthrax.net <mailto:j...@nanthrax.net>>
> wrote:
> >>> >>
> >>> >>                         Thanks for bringing that back.
> >>> >>
> >>> >>                         Indeed guava is shaded in different
> uber-jar.
> >>> >> Maybe we
> >>> >>                         can have a common deps module that we
> include
> >>> >> once
> >>> >>                         (but the user will have to explicitly define
> >>> >> the dep) ?
> >>> >>
> >>> >>                         Basically, what do you propose for protobuf
> >>> >>                         (unfortunately, I don't see an obvious) ?
> >>> >>
> >>> >>                         Regards
> >>> >>                         JB
> >>> >>
> >>> >>
> >>> >>                         On 12/11/2017 05:35 PM, Ismaël Mejía wrote:
> >>> >>
> >>> >>                             Hello, I wanted to bring back this
> subject
> >>> >> because
> >>> >>                             I think we should
> >>> >>                             take action on this and at least first
> >>> >> have a
> >>> >>                             shaded version of guava.
> >>> >>                             I was playing with a toy project and I
> did
> >>> >> the
> >>> >>                             procedure we use to
> >>> >>                             submit jars to a Hadoop cluster via
> >>> >> Flink/Spark
> >>> >>                             which involves
> >>> >>                             creating an uber jar and I realized that
> >>> >> the size
> >>> >>                             of the jar was way
> >>> >>                             bigger than I expected, and the fact
> that
> >>> >> we shade
> >>> >>                             guava in every
> >>> >>                             module contributes to this. I found
> guava
> >>> >> shaded on:
> >>> >>
> >>> >>                             sdks/java/core
> >>> >>                             runners/core-construction-java
> >>> >>                             runners/core-java
> >>> >>                             model/job-management
> >>> >>                             runners/spark
> >>> >>                             sdks/java/io/hadoop-file-system
> >>> >>                             sdks/java/io/kafka
> >>> >>
> >>> >>                             This means at least 6 times more of the
> >>> >> size it
> >>> >>                             should which counts in
> >>> >>                             around 15MB more (2.4MB*6 deps) of extra
> >>> >> weight
> >>> >>                             that we can simply
> >>> >>                             reduce by using a shaded version of the
> >>> >> library.
> >>> >>
> >>> >>                             I add this point to the previous ones
> >>> >> mentioned
> >>> >>                             during the discussion
> >>> >>                             because this goes against the end user
> >>> >> experience
> >>> >>                             and affects us all
> >>> >>                             (devs/users).
> >>> >>
> >>> >>                             Another question is if we should shade
> >>> >> (and how)
> >>> >>                             protocol buffers
> >>> >>                             because now with the portability work we
> >>> >> are
> >>> >>                             exposing it closer to the
> >>> >>                             end users. I say this because I also
> found
> >>> >> an
> >>> >>                             issue while running a
> >>> >>                             job on YARN with the spark runner
> because
> >>> >>                             hadoop-common includes
> >>> >>                             protobuf-java 2 and I had to explicitly
> >>> >> provide
> >>> >>                             protocol-buffers 3 as
> >>> >>                             a dependency to be able to use triggers
> >>> >> (note the
> >>> >>                             Spark runner
> >>> >>                             translates them using some method from
> >>> >>                             runners/core-java). Since
> >>> >>                             hadoop-common is provided in the cluster
> >>> >> with the
> >>> >>                             older version of
> >>> >>                             protobuf, I am afraid that this will
> bite
> >>> >> us in
> >>> >>                             the future.
> >>> >>
> >>> >>                             Ismaël
> >>> >>
> >>> >>                             ps. There is already a JIRA for that
> >>> >> shading for
> >>> >>                             protobuf on
> >>> >>                             hadoop-common but this is not coming
> until
> >>> >> version
> >>> >>                             3 is out.
> >>> >>
> >>> >> https://issues.apache.org/jira/browse/HADOOP-13136
> >>> >>
> >>> >> <https://issues.apache.org/jira/browse/HADOOP-13136>
> >>> >>
> >>> >>                             ps2. Extra curious situation is to see
> >>> >> that the
> >>> >>                             dataflow-runner ends
> >>> >>                             up having guava shaded twice via its
> >>> >> shaded version on
> >>> >>                             core-construction-java.
> >>> >>
> >>> >>                             ps3. Of course this message means a
> >>> >> de-facto +1 at
> >>> >>                             least to do it for
> >>> >>                             guava and evaluate it for other libs.
> >>> >>
> >>> >>
> >>> >>                             On Tue, Oct 17, 2017 at 7:29 PM, Lukasz
> >>> >> Cwik
> >>> >>                             <lc...@google.com.invalid
> >>> >>                             <mailto:lc...@google.com.invalid>>
> wrote:
> >>> >>
> >>> >>                                 An issue to call out is how to deal
> >>> >> with our
> >>> >>                                 generated code (.avro and
> >>> >>                                 .proto) as I don't believe those
> >>> >> plugins allow
> >>> >>                                 you to generate code using a
> >>> >>                                 shaded package prefix on imports.
> >>> >>
> >>> >>                                 On Tue, Oct 17, 2017 at 10:28 AM,
> >>> >> Thomas Groh
> >>> >>                                 <tg...@google.com.invalid
> >>> >>                                 <mailto:tg...@google.com.invalid>>
> >>> >>                                 wrote:
> >>> >>
> >>> >>                                     +1 to the goal. I'm hugely in
> >>> >> favor of not
> >>> >>                                     doing the same shading work
> >>> >>                                     every time for dependencies we
> >>> >> know we'll use.
> >>> >>
> >>> >>                                     This also means that if we end
> up
> >>> >> pulling
> >>> >>                                     in transitive dependencies we
> >>> >>                                     don't want in any particular
> >>> >> module we can
> >>> >>                                     avoid having to adjust our
> >>> >>                                     repackaging strategy for that
> >>> >> module -
> >>> >>                                     which I have run into face-first
> >>> >> in
> >>> >>                                     the past.
> >>> >>
> >>> >>                                     On Tue, Oct 17, 2017 at 9:48 AM,
> >>> >> Kenneth
> >>> >>                                     Knowles <k...@google.com.invalid
> >>> >>                                     <mailto:k...@google.com.invalid
> >>
> >>> >>                                     wrote:
> >>> >>
> >>> >>                                         Hi all,
> >>> >>
> >>> >>                                         Shading is a big part of how
> >>> >> we keep
> >>> >>                                         our dependencies sane in
> Beam.
> >>> >> But
> >>> >>                                         downsides: shading is super
> >>> >> slow,
> >>> >>                                         causes massive jar bloat,
> and
> >>> >> kind of
> >>> >>                                         hard to get right because
> >>> >> artifacts
> >>> >>                                         and namespaces are not
> 1-to-1.
> >>> >>
> >>> >>                                         I know that some communities
> >>> >>                                         distribute their own shaded
> >>> >>                                         distributions of
> >>> >>                                         dependencies. I had a
> thought
> >>> >> about
> >>> >>                                         doing something similar
> that I
> >>> >> wanted
> >>> >>                                         to throw out there for
> people
> >>> >> to poke
> >>> >>                                         holes in.
> >>> >>
> >>> >>                                         To set the scene, here is
> how
> >>> >> I view
> >>> >>                                         shading:
> >>> >>
> >>> >>                                           - A module has public
> >>> >> dependencies
> >>> >>                                         and private dependencies.
> >>> >>                                           - Public deps are used for
> >>> >> data
> >>> >>                                         interchange; users must
> share
> >>> >> these
> >>> >>
> >>> >>                                     deps.
> >>> >>
> >>> >>                                           - Private deps are just
> >>> >>                                         functionality and can be
> >>> >> hidden (in
> >>> >>                                         our case,
> >>> >>                                         relocated + bundled)
> >>> >>                                           - It isn't necessarily
> that
> >>> >> simple,
> >>> >>                                         because public and private
> >>> >> deps
> >>> >>
> >>> >>                                     might
> >>> >>
> >>> >>                                         interact in higher-order
> ways
> >>> >>                                         ("public" is contagious)
> >>> >>
> >>> >>                                         Shading is an implementation
> >>> >> detail of
> >>> >>                                         expressing these
> >>> >> characteristics.
> >>> >>
> >>> >>                                     We
> >>> >>
> >>> >>                                         use shading selectively
> >>> >> because of its
> >>> >>                                         downsides I mentioned above.
> >>> >>
> >>> >>                                         But what about this idea:
> >>> >> Introduce
> >>> >>                                         shaded deps as a single
> >>> >> separate
> >>> >>                                         artifact.
> >>> >>
> >>> >>                                           - sdks/java/private-deps:
> >>> >> bundled
> >>> >>                                         uber jar with relocated
> >>> >> versions of
> >>> >>                                         everything we want to shade
> >>> >>
> >>> >>                                           - sdks/java/core and
> >>> >>                                         sdks/java/harness: no
> >>> >> relocation or
> >>> >>                                         bundling -
> >>> >>                                         depends on
> >>> >>
>  `beam-sdks-java-private-deps`
> >>> >> and
> >>> >>                                         imports like
> >>> >>
> >>> >> `org.apache.beam.sdk.private.com.google.common`
> >>> >>                                         directly (this is what
> >>> >>                                         they
> >>> >>                                         are rewritten to
> >>> >>
> >>> >>                                         Some benefits
> >>> >>
> >>> >>                                           - much faster builds of
> >>> >> other modules
> >>> >>                                           - only one shaded uber jar
> >>> >>                                           - rare/no rebuilds of the
> >>> >> uber jar
> >>> >>                                           - can use maven enforcer
> to
> >>> >> forbid
> >>> >>                                         imports like
> com.google.common
> >>> >>                                           - configuration all in one
> >>> >> place
> >>> >>                                           - no automated rewriting
> of
> >>> >> our real
> >>> >>                                         code, which has led to some
> >>> >> major
> >>> >>                                         confusion
> >>> >>                                           - easy to implement
> >>> >> incrementally
> >>> >>
> >>> >>                                         Downsides:
> >>> >>
> >>> >>                                           - plenty of effort work to
> >>> >> get there
> >>> >>                                           - unclear how many
> different
> >>> >> such
> >>> >>                                         deps modules we need;
> sharing
> >>> >> them
> >>> >>
> >>> >>                                     could
> >>> >>
> >>> >>                                         get weird
> >>> >>                                           - if we hit a roadblock,
> we
> >>> >> will
> >>> >>                                         have committed a lot of time
> >>> >>
> >>> >>                                         Just something I was musing
> as
> >>> >> I spent
> >>> >>                                         another evening waiting for
> >>> >> slow
> >>> >>                                         builds to try to confirm
> >>> >> changes to
> >>> >>                                         brittle poms.
> >>> >>
> >>> >>                                         Kenn
> >>> >>
> >>> >>
> >>> >>
> >>> >>                         --
> >>> >>                         Jean-Baptiste Onofré
> >>> >>                         jbono...@apache.org
> >>> >> <mailto:jbono...@apache.org>
> >>> >>                         http://blog.nanthrax.net
> >>> >> <http://blog.nanthrax.net/>
> >>> >>                         Talend - http://www.talend.com
> >>> >> <http://www.talend.com/>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >
> >>> >
> >>> >
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>
> >>
> >
>

Reply via email to