I believe that shading is a net win because many larger projects have
hundreds of transitive dependencies and making sure that you can use a
complex library like Beam with another complex library like Spark or Hadoop
quickly becomes untenable without something like shading due to version
compatibility issues. I also believe that shading does simplify the getting
started experience for many users since we would only need to expose the
dependencies which cross our API boundaries.

It does come at the cost of dealing with libraries that don't honor API
boundaries (e.g. reflection, serialization, code generation libraries) and
finding either effective workarounds or increasing the API surface of what
is not shaded. Which is all extra work for Beam maintainers.

Its not impossible to have a large project work with another large project
but it is also equally difficult since we give up a lot of version
compatibility freedom.

This does not mean we can't have two artifacts, one shaded and one not but
if we were to have both, would this hurt portability between runners?
If you have experience maintaining another project with or without shading
like tech, would love to hear it as well.

On Mon, Jul 25, 2016 at 11:09 AM, Amit Sela <[email protected]> wrote:

> I wanted to raise the issue of the SDK providing a shaded jar for
> dependency.
>
> AFAIK it's generally considered bad practice, though eventually it's
> trade-offs. I can definitely understand why here
> <
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/pom.xml#L196
> >
> we
> shade Guava for example - so the user can use it's desired Guava version -
> but on the other hand, runners eventually have to serialize/deserialize the
> SDK's classes sometimes.
> For example: Using "Top.largest(10)" to get top 10 results, uses the SDK's
> BoundedHeap, which is backed by a ReverseList which is currently not
> supported by Kryo (I've submitted a PR), but even if you write-up a
> ReverseListSerializer, in order to register the class you have to
> explicitly state it's *repackaged* name... (I know you can use Coders to
> shuffle bytes around and so avoid Kryo serializing classes)
>
> I'm not saying that shading is completely wrong in some cases, but I would
> like to know more about the considerations made, and let's not forget that
> some runners (Spark for example) shade also... How risky is it for Beam to
> provide such shaded artifacts ? What/How should we inform our users about
> it ? i.e., Elastic published this
> <https://www.elastic.co/blog/to-shade-or-not-to-shade> about their choice
> for (not) shading.
>
> Thanks,
> Amit
>

Reply via email to