[ https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley reassigned SPARK-24632: ----------------------------------------- Assignee: Joseph K. Bradley > Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers > for persistence > ------------------------------------------------------------------------------------------ > > Key: SPARK-24632 > URL: https://issues.apache.org/jira/browse/SPARK-24632 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark > Affects Versions: 2.4.0 > Reporter: Joseph K. Bradley > Assignee: Joseph K. Bradley > Priority: Major > > This is a follow-up for [SPARK-17025], which allowed users to implement > Python PipelineStages in 3rd-party libraries, include them in Pipelines, and > use Pipeline persistence. This task is to make it easier for 3rd-party > libraries to have PipelineStages written in Java and then to use pyspark.ml > abstractions to create wrappers around those Java classes. This is currently > possible, except that users hit bugs around persistence. > Some fixes we'll need include: > * an overridable method for converting between Python and Java classpaths. > See > https://github.com/apache/spark/blob/b56e9c613fb345472da3db1a567ee129621f6bf3/python/pyspark/ml/util.py#L284 > * > https://github.com/apache/spark/blob/4e7d8678a3d9b12797d07f5497e0ed9e471428dd/python/pyspark/ml/pipeline.py#L378 > One unusual thing for this task will be to write unit tests which test a > custom PipelineStage written outside of the pyspark package. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org