Jim Kleckner created SPARK-6029:
-----------------------------------
Summary: Spark excludes "fastutil" dependencies of "clearspring"
quantiles
Key: SPARK-6029
URL: https://issues.apache.org/jira/browse/SPARK-6029
Project: Spark
Issue Type: Bug
Components: Build
Affects Versions: 1.2.1
Reporter: Jim Kleckner
Spark includes the clearspring analytics package but intentionally excludes the
dependencies of the fastutil package.
Spark includes parquet-column which includes fastutil and relocates it under
parquet/ but creates a shaded jar file which is incomplete because it shades
out some of the fastutil classes, notably Long2LongOpenHashMap, which is
present in the fastutil jar file that parquet-column is referencing.
We are using more of the clearspring classes (e.g. QDigest) and those do depend
on missing fastutil classes like Long2LongOpenHashMap.
Even though I add them to our assembly jar file, the class loader finds the
spark assembly and we get runtime class loader errors when we try to use it.
The
[documentaion|http://spark.apache.org/docs/1.2.0/configuration.html#runtime-environment]
and possibly related issue
[SPARK-939|https://issues.apache.org/jira/browse/SPARK-939] suggest arguments
that I tried with spark-submit:
{code}
--conf spark.driver.userClassPathFirst=true \
--conf spark.executor.userClassPathFirst=true
{code}
but we still get the class not found error.
Could this be a bug with {{userClassPathFirst=true}}? i.e. should it work?
In any case, would it be reasonable to not exclude the "fastutil" dependencies?
See email discussion
[here|http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Spark-excludes-quot-fastutil-quot-dependencies-we-need-tt21812.html]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]