[
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255712#comment-17255712
]
Bryan Cutler commented on SPARK-24632:
--------------------------------------
Ping [~huaxingao] in case you have some time to look into this.
> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers
> for persistence
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-24632
> URL: https://issues.apache.org/jira/browse/SPARK-24632
> Project: Spark
> Issue Type: Improvement
> Components: ML, PySpark
> Affects Versions: 3.1.0
> Reporter: Joseph K. Bradley
> Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and
> use Pipeline persistence. This task is to make it easier for 3rd-party
> libraries to have PipelineStages written in Java and then to use pyspark.ml
> abstractions to create wrappers around those Java classes. This is currently
> possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the
> doc linked below. Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers
> implement a trait which provides the corresponding Python classpath in some
> field:
> {code}
> trait PythonWrappable {
> def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.
> They would ideally test a Java class + Python wrapper class pair sitting
> outside of pyspark.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]