[ 
https://issues.apache.org/jira/browse/SPARK-15526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Villu Ruusmann updated SPARK-15526:
-----------------------------------
    Description: 
The Spark-MLlib module depends on the JPMML-Model library 
(org.jpmml:pmml-model:1.2.7) for its PMML export capabilities. The JPMML-Model 
library is included in the Apache Spark assembly, which makes it very difficult 
to build and deploy competing PMML exporters that may wish to depend on 
different versions (typically much newer) of the same library.

JPMML-Model library classes are not part of Apache Spark public APIs, so it 
shouldn't be a problem if they are relocated by prepending a prefix 
"org.spark_project" to their package names using Maven Shade Plugin. The 
requested treatment is identical to how Google Guava and Jetty dependencies are 
shaded in the final assembly.

This issue is raised in relation to the JPMML-SparkML project 
(https://github.com/jpmml/jpmml-sparkml), which provides PMML export 
capabilities for Spark ML Pipelines. Currently, application developers who wish 
to use it must tweak their application classpath, which assumes familiarity 
with build internals.

  was:
The Spark-MLlib module depends on the JPMML-Model library 
(org.jpmml:pmml-model:1.2.7) for its PMML export capabilities. The JPMML-Model 
library is included in the Apache Spark assembly, which makes it very difficult 
to build and deploy competing PMML exporters that may wish to depend on 
different versions (typically much newer) of the same library.

JPMML-Model library classes are not part of Apache Spark public APIs, so it 
shouldn't be a problem if they are relocated by prepending a prefix 
"org.spark_project" to their package names using Maven Shade Plugin. The 
requested treatment is identical to how Google Guava and Jetty dependencies are 
shaded in the final assembly.

This issue is raised in relation to the JPMML-SparkML project 
(https://github.com/jpmml/jpmml-sparkml), which provides PMML export 
capabilities for Spark ML Pipelines. Currently, application developers who wish 
to use it must tweak their application classpath, which is assumes familiarity 
with build internals.


> Shade JPMML
> -----------
>
>                 Key: SPARK-15526
>                 URL: https://issues.apache.org/jira/browse/SPARK-15526
>             Project: Spark
>          Issue Type: Dependency upgrade
>          Components: ML, MLlib
>    Affects Versions: 2.0.0
>            Reporter: Villu Ruusmann
>            Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The Spark-MLlib module depends on the JPMML-Model library 
> (org.jpmml:pmml-model:1.2.7) for its PMML export capabilities. The 
> JPMML-Model library is included in the Apache Spark assembly, which makes it 
> very difficult to build and deploy competing PMML exporters that may wish to 
> depend on different versions (typically much newer) of the same library.
> JPMML-Model library classes are not part of Apache Spark public APIs, so it 
> shouldn't be a problem if they are relocated by prepending a prefix 
> "org.spark_project" to their package names using Maven Shade Plugin. The 
> requested treatment is identical to how Google Guava and Jetty dependencies 
> are shaded in the final assembly.
> This issue is raised in relation to the JPMML-SparkML project 
> (https://github.com/jpmml/jpmml-sparkml), which provides PMML export 
> capabilities for Spark ML Pipelines. Currently, application developers who 
> wish to use it must tweak their application classpath, which assumes 
> familiarity with build internals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to