Pardon my late entry into the fray, here, but we've just struggled though some library conflicts that could have been avoided and whose story shed some light on this question.

We have been integrating Spark with a number of other components. We discovered several conflicts, most easily eliminated. But the ASM conflicts were not quite so easy to handle because of ASM's API changes between 3.x and 4.x (most usually seen first in ClassVisitor which was an interface and now is an abstract class).

The spark-core_2.10 has a transitive dependency on 4.0. Hive, Hadoop, various Java EE servlets, and other libraries have transitive dependencies on 3.2 or earlier. In one of the applications we are developing, there are 10 libraries with ASM dependencies. Five are well-behaved, having shaded ASM. Another five, are poorly behaved, not shading it. The ASM FAQ specifically recommends shading ASM in any tool or framework which contains it: http://asm.ow2.org/doc/faq.html#Q15.

ASM has been shaded in the SBT build since June 2013. However, it was not properly shaded in the Maven build until last week. As result, libraries such as spark-core_2.10 pushed to Maven Central haven't reflected the SBT build. This is documented in Jira SPARK-782: https://spark-project.atlassian.net/browse/SPARK-782

We cannot use SBT for our overall project. Maven is our standard. Hence, we are dependent on Maven Central and libraries mirrored by our corporate repository.

In this context, if both builds are maintained, then they need to have the same functionality.

If only one build must be retained, it should be Maven because Maven and other tools that use Maven Central are more likely to be used for large project integrations. Also for this reason, the Maven build should be given more priority than at present. It seems a bit odd, if a Maven project can be automatically generated from SBT, that it would take 1 year for ASM shading in Maven to catch up with SBT.

Thanks
Kevin Markey

SBT appears to have syntax for both, just like Maven. Surely these
have the same meanings in SBT, and excluding artifacts is accomplished
with exclude and excludeAll, as seen in the Spark build?

The assembly and shader stuff in Maven is more about controlling
exactly how it's put together into an artifact, at the level of files
even, to stick a license file in or exclude some data file cruft or
rename dependencies.

exclusions and shading are necessary evils to be used as sparingly as
possible. Dependency graphs get nuts fast here, and Spark is already
quite big. (Hence my recent PR to start touching it up -- more coming
for sure.)


Reply via email to