Re: [DISCUSS] Publish vendored dependencies independently

Lukasz Cwik Fri, 19 Oct 2018 16:12:01 -0700

I have tried several times to improve the build system and intellij
integration and each attempt ended with little progress when dealing with
vendored code. My latest attempt has been the most promising where I take
the vendored classes/jars and decompile them generating the source that
Intellij can then use. I have a branch[1] that demonstrates the idea. It
works pretty well (and up until a change where we started vendoring gRPC,
was impractical to do. Instructions to try it out are:

// Clean up any remnants of prior builds/intellij projects
git clean -fdx
// Generated the source for vendored/shaded modules
./gradlew decompile

// Remove the "generated" Java sources for protos so they don't
conflict with the decompiled sources.
rm -rf model/pipeline/build/generated/source/proto
rm -rf model/job-management/build/generated/source/proto
rm -rf model/fn-execution/build/generated/source/proto
// Import the project into Intellij, most code completion now works
still some issues with a few classes.
// Note that the Java decompiler doesn't generate valid source so
still need to delegate to Gradle for build/run/test actions
// Other decompilers may do a better/worse job but haven't tried them.

The problems that I face are that the generated Java source from the protos
and the decompiled source from the compiled version of that source post
shading are both being imported as content roots and then conflict. Also,
the CFR decompiler isn't producing valid source, if people could try others
and report their mileage, we may find one that works and then we would be
able to use intellij to build/run our code and not need to delegate all our
build/run/test actions to Gradle.

After all these attempts I have done, vendoring the dependencies outside of
the project seems like a sane approach and unless someone wants to take a
stab at the best progress I have made above, I would go with what Kenn is
suggesting even though it will mean that we will need to perform releases
every time we want to change the version of one of our vendored
dependencies.

1: https://github.com/lukecwik/incubator-beam/tree/intellij

On Fri, Oct 19, 2018 at 10:43 AM Kenneth Knowles <[email protected]> wrote:

> Another reason to push on this is to get build times down. Once only
> generated proto classes use the shadow plugin we'll cut the build time in
> ~half? And there is no reason to constantly re-vendor.
>
> Kenn
>
> On Fri, Oct 19, 2018 at 10:39 AM Kenneth Knowles <[email protected]> wrote:
>
>> Hi all,
>>
>> A while ago we had pretty good consensus that we should vendor
>> ("pre-shade") specific dependencies, and there's start on it here:
>> https://github.com/apache/beam/tree/master/vendor
>>
>> IntelliJ notes:
>>
>>  - With shading, it is hard (impossible?) to step into dependency code in
>> IntelliJ's debugger, because the actual symbol at runtime does not match
>> what is in the external jars
>>
>
Intellij can step through the classes if they were published outside the
project since it can decompile them. The source won't be legible.
Decompiling the source as in my example effectively shows the same issue.

>  - With vendoring, if the vendored dependencies are part of the project
>> then IntelliJ gets confused because it operates on source, not the produced
>> jars
>>
>
Yes, I tried several ways to get intellij to ignore the source and use the
output jars but no luck.

> The second one has a quick fix for most cases*: don't make the vendored
>> dep a subproject, but just separately build and publish it. Since a
>> vendored dependency should change much more infrequently (or if we bake the
>> version into the name, ~never) this means we publish once and save headache
>> and build time forever.
>>
>> WDYT? Have I overlooked something? How about we set up vendored versions
>> of guava, protobuf, grpc, and publish them. We don't have to actually start
>> using them yet, and can do it incrementally.
>>
>
Currently we are relocating code depending on the version string. If the
major version is >= 1, we use only the major version within the package
string and rely on semantic versioning provided by the dependency to not
break people. If the major version is 0, we assume the dependency is
unstable and use the full version as part of the package string during
relocation.

The downside of using the full version string for relocated packages:
1) Users will end up with multiple copies of dependencies that differ only
by the minor or patch version increasing the size of their application.
2) Bumping up the version of a dependency now requires the import statement
in all java files to be updated (not too difficult with some sed/grep
skills)

The upside of using the full version string in the relocated package:
1) We don't have to worry about whether a dependency maintains semantic
versioning which means our users won't have to worry about that either.
2) This increases the odds that a user will load multiple slightly
different versions of the same dependency which is known to be incompatible
in certain situations (e.g. Netty 4.1.25 can't be on the classpath with
Netty 4.1.28 even though they are both shaded due to issues of how JNI with
tcnative works).

>
>> (side note: what do other projects like Flink do?)
>>
>> Kenn
>>
>> *for generated proto classes, they need to be altered after being
>> generated so shading happens there, but actually only relocation and the
>> shared vendored dep should work elsewhere in the project
>>
>
We could publish the protos and treat them as "external" dependencies
within the Java projects which would also remove this pain point.

Re: [DISCUSS] Publish vendored dependencies independently

Reply via email to