OK, I've filed https://issues.apache.org/jira/browse/BEAM-5819 to collect
sub-tasks. This has enough upsides throughout lots of areas of the project
that even though it is not glamorous it seems pretty valuable to start on
immediately. And I want to find out if there's a pitfall lurking.

Max: what is the story behind having a separate flink-shaded repo? Did that
make it easier to manage in some way?

Kenn

On Mon, Oct 22, 2018 at 2:55 AM Maximilian Michels <[email protected]> wrote:

> +1 for publishing vendored Jars independently. It will improve build
> time and ease IntelliJ integration.
>
> Flink also publishes shaded dependencies separately:
>
> - https://github.com/apache/flink-shaded
> - https://issues.apache.org/jira/browse/FLINK-6529
>
> AFAIK their main motivation was to get rid of duplicate shaded classes
> on the classpath. We don't appear to have that problem because we
> already have a separate "vendor" project.
>
> >  - With shading, it is hard (impossible?) to step into dependency code
> in IntelliJ's debugger, because the actual symbol at runtime does not match
> what is in the external jars
>
> This would be solved by releasing the sources of the shaded jars. From a
> legal perspective, this could be problematic as alluded to here:
> https://github.com/apache/flink-shaded/issues/25
>
> -Max
>
> On 20.10.18 01:11, Lukasz Cwik wrote:
> > I have tried several times to improve the build system and intellij
> > integration and each attempt ended with little progress when dealing
> > with vendored code. My latest attempt has been the most promising where
> > I take the vendored classes/jars and decompile them generating the
> > source that Intellij can then use. I have a branch[1] that demonstrates
> > the idea. It works pretty well (and up until a change where we started
> > vendoring gRPC, was impractical to do. Instructions to try it out are:
> >
> > // Clean up any remnants of prior builds/intellij projects
> > git clean -fdx
> > // Generated the source for vendored/shaded modules
> > ./gradlew decompile
> >
> > // Remove the "generated" Java sources for protos so they don't conflict
> with the decompiled sources.
> > rm -rf model/pipeline/build/generated/source/proto
> > rm -rf model/job-management/build/generated/source/proto
> > rm -rf model/fn-execution/build/generated/source/proto
> > // Import the project into Intellij, most code completion now works
> still some issues with a few classes.
> > // Note that the Java decompiler doesn't generate valid source so still
> need to delegate to Gradle for build/run/test actions
> > // Other decompilers may do a better/worse job but haven't tried them.
> >
> >
> > The problems that I face are that the generated Java source from the
> > protos and the decompiled source from the compiled version of that
> > source post shading are both being imported as content roots and then
> > conflict. Also, the CFR decompiler isn't producing valid source, if
> > people could try others and report their mileage, we may find one that
> > works and then we would be able to use intellij to build/run our code
> > and not need to delegate all our build/run/test actions to Gradle.
> >
> > After all these attempts I have done, vendoring the dependencies outside
> > of the project seems like a sane approach and unless someone wants to
> > take a stab at the best progress I have made above, I would go with what
> > Kenn is suggesting even though it will mean that we will need to perform
> > releases every time we want to change the version of one of our vendored
> > dependencies.
> >
> > 1: https://github.com/lukecwik/incubator-beam/tree/intellij
> >
> >
> > On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Another reason to push on this is to get build times down. Once only
> >     generated proto classes use the shadow plugin we'll cut the build
> >     time in ~half? And there is no reason to constantly re-vendor.
> >
> >     Kenn
> >
> >     On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         Hi all,
> >
> >         A while ago we had pretty good consensus that we should vendor
> >         ("pre-shade") specific dependencies, and there's start on it
> >         here: https://github.com/apache/beam/tree/master/vendor
> >
> >         IntelliJ notes:
> >
> >           - With shading, it is hard (impossible?) to step into
> >         dependency code in IntelliJ's debugger, because the actual
> >         symbol at runtime does not match what is in the external jars
> >
> >
> > Intellij can step through the classes if they were published outside the
> > project since it can decompile them. The source won't be legible.
> > Decompiling the source as in my example effectively shows the same issue.
> >
> >           - With vendoring, if the vendored dependencies are part of the
> >         project then IntelliJ gets confused because it operates on
> >         source, not the produced jars
> >
> >
> > Yes, I tried several ways to get intellij to ignore the source and use
> > the output jars but no luck.
> >
> >         The second one has a quick fix for most cases*: don't make the
> >         vendored dep a subproject, but just separately build and publish
> >         it. Since a vendored dependency should change much more
> >         infrequently (or if we bake the version into the name, ~never)
> >         this means we publish once and save headache and build time
> forever.
> >
> >         WDYT? Have I overlooked something? How about we set up vendored
> >         versions of guava, protobuf, grpc, and publish them. We don't
> >         have to actually start using them yet, and can do it
> incrementally.
> >
> >
> > Currently we are relocating code depending on the version string. If the
> > major version is >= 1, we use only the major version within the package
> > string and rely on semantic versioning provided by the dependency to not
> > break people. If the major version is 0, we assume the dependency is
> > unstable and use the full version as part of the package string during
> > relocation.
> >
> > The downside of using the full version string for relocated packages:
> > 1) Users will end up with multiple copies of dependencies that differ
> > only by the minor or patch version increasing the size of their
> application.
> > 2) Bumping up the version of a dependency now requires the import
> > statement in all java files to be updated (not too difficult with some
> > sed/grep skills)
> >
> > The upside of using the full version string in the relocated package:
> > 1) We don't have to worry about whether a dependency maintains semantic
> > versioning which means our users won't have to worry about that either.
> > 2) This increases the odds that a user will load multiple slightly
> > different versions of the same dependency which is known to be
> > incompatible in certain situations (e.g. Netty 4.1.25 can't be on the
> > classpath with Netty 4.1.28 even though they are both shaded due to
> > issues of how JNI with tcnative works).
> >
> >
> >         (side note: what do other projects like Flink do?)
> >
> >         Kenn
> >
> >         *for generated proto classes, they need to be altered after
> >         being generated so shading happens there, but actually only
> >         relocation and the shared vendored dep should work elsewhere in
> >         the project
> >
> >
> > We could publish the protos and treat them as "external" dependencies
> > within the Java projects which would also remove this pain point.
>

Reply via email to