Vladimir Matveev created SPARK-32385:
----------------------------------------
Summary: Publish a "bill of materials" (BOM) descriptor for Spark
with correct versions of various dependencies
Key: SPARK-32385
URL: https://issues.apache.org/jira/browse/SPARK-32385
Project: Spark
Issue Type: Improvement
Components: Build
Affects Versions: 3.0.0, 2.4.6
Reporter: Vladimir Matveev
Spark has a lot of dependencies, many of them very common (e.g. Guava,
Jackson). Also, versions of these dependencies are not updated as frequently as
they are released upstream, which is totally understandable and natural, but
which also means that often Spark has a dependency on a lower version of a
library, which is incompatible with a higher, more recent version of the same
library. This incompatibility can manifest in different ways, e.g as classpath
errors or runtime check errors (like with Jackson), in certain cases.
Spark does attempt to "fix" versions of its dependencies by declaring them
explicitly in its {{pom.xml}} file. However, this approach, being somewhat
workable if the Spark-using project itself uses Maven, breaks down if another
build system is used, like Gradle. The reason is that Maven uses an
unconventional "nearest first" version conflict resolution strategy, while many
other tools like Gradle use the "highest first" strategy which resolves the
highest possible version number inside the entire graph of dependencies. This
means that other dependencies of the project can pull a higher version of some
dependency, which is incompatible with Spark.
One example would be an explicit or a transitive dependency on a higher version
of Jackson in the project. Spark itself depends on several modules of Jackson;
if only one of them gets a higher version, and others remain on the lower
version, this will result in runtime exceptions due to an internal version
check in Jackson.
A widely used solution for this kind of version issues is publishing of a "bill
of materials" descriptor (see here:
[https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html]
and here:
[https://docs.gradle.org/current/userguide/platforms.html#sub:bom_import]).
This descriptor would contain all versions of all dependencies of Spark; then
downstream projects will be able to use their build system's support for BOMs
to enforce version constraints required for Spark to function correctly.
One example of successful implementation of the BOM-based approach is Spring:
[https://www.baeldung.com/spring-maven-bom#spring-bom]. For different Spring
projects, e.g. Spring Boot, there are BOM descriptors published which can be
used in downstream projects to fix the versions of Spring components and their
dependencies, significantly reducing confusion around proper version numbers.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]