I wanted to raise the issue of the SDK providing a shaded jar for
dependency.

AFAIK it's generally considered bad practice, though eventually it's
trade-offs. I can definitely understand why here
<https://github.com/apache/incubator-beam/blob/master/sdks/java/core/pom.xml#L196>
we
shade Guava for example - so the user can use it's desired Guava version -
but on the other hand, runners eventually have to serialize/deserialize the
SDK's classes sometimes.
For example: Using "Top.largest(10)" to get top 10 results, uses the SDK's
BoundedHeap, which is backed by a ReverseList which is currently not
supported by Kryo (I've submitted a PR), but even if you write-up a
ReverseListSerializer, in order to register the class you have to
explicitly state it's *repackaged* name... (I know you can use Coders to
shuffle bytes around and so avoid Kryo serializing classes)

I'm not saying that shading is completely wrong in some cases, but I would
like to know more about the considerations made, and let's not forget that
some runners (Spark for example) shade also... How risky is it for Beam to
provide such shaded artifacts ? What/How should we inform our users about
it ? i.e., Elastic published this
<https://www.elastic.co/blog/to-shade-or-not-to-shade> about their choice
for (not) shading.

Thanks,
Amit

Reply via email to