Looks great, Kenn!

Max: what is the story behind having a separate flink-shaded repo? Did that 
make it easier to manage in some way?

Better separation of concerns, but I don't think releasing the shaded artifacts from the main repo is a problem. I'd even prefer not to split up the repo because it makes updates to the vendored dependencies slightly easier.

On 23.10.18 03:25, Kenneth Knowles wrote:
OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to collect sub-tasks. This has enough upsides throughout lots of areas of the project that even though it is not glamorous it seems pretty valuable to start on immediately. And I want to find out if there's a pitfall lurking.

Max: what is the story behind having a separate flink-shaded repo? Did that make it easier to manage in some way?

Kenn

On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <[email protected] <mailto:[email protected]>> wrote:

    +1 for publishing vendored Jars independently. It will improve build
    time and ease IntelliJ integration.

    Flink also publishes shaded dependencies separately:

    - https://github.com/apache/flink-shaded
    - https://issues.apache.org/jira/browse/FLINK-6529

    AFAIK their main motivation was to get rid of duplicate shaded classes
    on the classpath. We don't appear to have that problem because we
    already have a separate "vendor" project.

     >  - With shading, it is hard (impossible?) to step into dependency
    code in IntelliJ's debugger, because the actual symbol at runtime
    does not match what is in the external jars

    This would be solved by releasing the sources of the shaded jars.
     From a
    legal perspective, this could be problematic as alluded to here:
    https://github.com/apache/flink-shaded/issues/25

    -Max

    On 20.10.18 01:11, Lukasz Cwik wrote:
     > I have tried several times to improve the build system and intellij
     > integration and each attempt ended with little progress when dealing
     > with vendored code. My latest attempt has been the most promising
    where
     > I take the vendored classes/jars and decompile them generating the
     > source that Intellij can then use. I have a branch[1] that
    demonstrates
     > the idea. It works pretty well (and up until a change where we
    started
     > vendoring gRPC, was impractical to do. Instructions to try it out
    are:
     >
     > // Clean up any remnants of prior builds/intellij projects
     > git clean -fdx
     > // Generated the source for vendored/shaded modules
     > ./gradlew decompile
     >
     > // Remove the "generated" Java sources for protos so they don't
    conflict with the decompiled sources.
     > rm -rf model/pipeline/build/generated/source/proto
     > rm -rf model/job-management/build/generated/source/proto
     > rm -rf model/fn-execution/build/generated/source/proto
     > // Import the project into Intellij, most code completion now
    works still some issues with a few classes.
     > // Note that the Java decompiler doesn't generate valid source so
    still need to delegate to Gradle for build/run/test actions
     > // Other decompilers may do a better/worse job but haven't tried
    them.
     >
     >
     > The problems that I face are that the generated Java source from the
     > protos and the decompiled source from the compiled version of that
     > source post shading are both being imported as content roots and
    then
     > conflict. Also, the CFR decompiler isn't producing valid source, if
     > people could try others and report their mileage, we may find one
    that
     > works and then we would be able to use intellij to build/run our
    code
     > and not need to delegate all our build/run/test actions to Gradle.
     >
     > After all these attempts I have done, vendoring the dependencies
    outside
     > of the project seems like a sane approach and unless someone
    wants to
     > take a stab at the best progress I have made above, I would go
    with what
     > Kenn is suggesting even though it will mean that we will need to
    perform
     > releases every time we want to change the version of one of our
    vendored
     > dependencies.
     >
     > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
     >
     >
     > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <[email protected]
    <mailto:[email protected]>
     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
     >
     >     Another reason to push on this is to get build times down.
    Once only
     >     generated proto classes use the shadow plugin we'll cut the build
     >     time in ~half? And there is no reason to constantly re-vendor.
     >
     >     Kenn
     >
     >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles
    <[email protected] <mailto:[email protected]>
     >     <mailto:[email protected] <mailto:[email protected]>>> wrote:
     >
     >         Hi all,
     >
     >         A while ago we had pretty good consensus that we should
    vendor
     >         ("pre-shade") specific dependencies, and there's start on it
     >         here: https://github.com/apache/beam/tree/master/vendor
     >
     >         IntelliJ notes:
     >
     >           - With shading, it is hard (impossible?) to step into
     >         dependency code in IntelliJ's debugger, because the actual
     >         symbol at runtime does not match what is in the external jars
     >
     >
     > Intellij can step through the classes if they were published
    outside the
     > project since it can decompile them. The source won't be legible.
     > Decompiling the source as in my example effectively shows the
    same issue.
     >
     >           - With vendoring, if the vendored dependencies are part
    of the
     >         project then IntelliJ gets confused because it operates on
     >         source, not the produced jars
     >
     >
     > Yes, I tried several ways to get intellij to ignore the source
    and use
     > the output jars but no luck.
     >
     >         The second one has a quick fix for most cases*: don't
    make the
     >         vendored dep a subproject, but just separately build and
    publish
     >         it. Since a vendored dependency should change much more
     >         infrequently (or if we bake the version into the name,
    ~never)
     >         this means we publish once and save headache and build
    time forever.
     >
     >         WDYT? Have I overlooked something? How about we set up
    vendored
     >         versions of guava, protobuf, grpc, and publish them. We don't
     >         have to actually start using them yet, and can do it
    incrementally.
     >
     >
     > Currently we are relocating code depending on the version string.
    If the
     > major version is >= 1, we use only the major version within the
    package
     > string and rely on semantic versioning provided by the dependency
    to not
     > break people. If the major version is 0, we assume the dependency is
     > unstable and use the full version as part of the package string
    during
     > relocation.
     >
     > The downside of using the full version string for relocated packages:
     > 1) Users will end up with multiple copies of dependencies that
    differ
     > only by the minor or patch version increasing the size of their
    application.
     > 2) Bumping up the version of a dependency now requires the import
     > statement in all java files to be updated (not too difficult with
    some
     > sed/grep skills)
     >
     > The upside of using the full version string in the relocated package:
     > 1) We don't have to worry about whether a dependency maintains
    semantic
     > versioning which means our users won't have to worry about that
    either.
     > 2) This increases the odds that a user will load multiple slightly
     > different versions of the same dependency which is known to be
     > incompatible in certain situations (e.g. Netty 4.1.25 can't be on
    the
     > classpath with Netty 4.1.28 even though they are both shaded due to
     > issues of how JNI with tcnative works).
     >
     >
     >         (side note: what do other projects like Flink do?)
     >
     >         Kenn
     >
     >         *for generated proto classes, they need to be altered after
     >         being generated so shading happens there, but actually only
     >         relocation and the shared vendored dep should work
    elsewhere in
     >         the project
     >
     >
     > We could publish the protos and treat them as "external"
    dependencies
     > within the Java projects which would also remove this pain point.

Reply via email to