Sean Owen created SPARK-28877:
---------------------------------
Summary: Investigate/fix JAXB failure running Pyspark tests on JDK
11
Key: SPARK-28877
URL: https://issues.apache.org/jira/browse/SPARK-28877
Project: Spark
Issue Type: Sub-task
Components: Build, PySpark
Affects Versions: 3.0.0
Reporter: Sean Owen
It looks like we might have a test failure in Pyspark with JDK 11:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109686/console
{code}
======================================================================
ERROR: test_linear_regression_pmml_basic
(pyspark.ml.tests.test_persistence.PersistenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/tests/test_persistence.py",
line 69, in test_linear_regression_pmml_basic
model.write().format("pmml").save(lr_path)
File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/ml/util.py",
line 175, in save
self._jwrite.save(path)
File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
line 1286, in __call__
answer, self.gateway_client, self.target_id, self.name)
File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/utils.py",
line 89, in deco
return f(*a, **kw)
File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py",
line 328, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o529.save.
: javax.xml.bind.JAXBException
- with linked exception:
[java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory]
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:241)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:477)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:656)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:599)
at org.jpmml.model.JAXBUtil.getContext(JAXBUtil.java:103)
at org.jpmml.model.JAXBUtil.createMarshaller(JAXBUtil.java:132)
at org.jpmml.model.JAXBUtil.marshal(JAXBUtil.java:77)
at org.jpmml.model.JAXBUtil.marshalPMML(JAXBUtil.java:67)
at
org.apache.spark.mllib.pmml.PMMLExportable.toPMML(PMMLExportable.scala:44)
at
org.apache.spark.mllib.pmml.PMMLExportable.toPMML(PMMLExportable.scala:78)
...
{code}
The error is typical of other JDK 11-related incompatibilities, because Java 9
removed the built-in JAXB implementation from Sun. It appears that somehow the
classpath is trying to load the 'old' JAXB implementation.
It's curious because the JVM-based tests appear to pass. This suggests it may
be more about how the Pyspark test classpath is constructed, and perhaps there
is an old dependency or something selecting this implementation via a
META-INF/MANIFEST.MF entry.
It's also curious because we seemed to observe Pyspark tests passing with JDK
11 during earlier testing. This is likely to be more related to how Pyspark
tests are run, but still needs a reproduction and an answer.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]