[jira] [Commented] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

Bryan Cutler (Jira) Mon, 28 Dec 2020 12:13:06 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255712#comment-17255712
 ]


Bryan Cutler commented on SPARK-24632:
--------------------------------------

Ping [~huaxingao] in case you have some time to look into this.

> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers 
> for persistence
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24632
>                 URL: https://issues.apache.org/jira/browse/SPARK-24632
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, PySpark
>    Affects Versions: 3.1.0
>            Reporter: Joseph K. Bradley
>            Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement 
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and 
> use Pipeline persistence.  This task is to make it easier for 3rd-party 
> libraries to have PipelineStages written in Java and then to use pyspark.ml 
> abstractions to create wrappers around those Java classes.  This is currently 
> possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the 
> doc linked below.  Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers 
> implement a trait which provides the corresponding Python classpath in some 
> field:
> {code}
> trait PythonWrappable {
>   def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.  
> They would ideally test a Java class + Python wrapper class pair sitting 
> outside of pyspark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

Reply via email to