[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920478#comment-16920478 ]
Junichi Koizumi commented on SPARK-28902: ------------------------------------------- Could you tell a little bit more about the workaround? It turns out to be fine on my version . pyspark : >>> from pyspark.ml import Pipeline >>> from pyspark.ml.feature import Tokenizer >>> t = Tokenizer() >>> p = Pipeline().setStages([t]) >>> d = spark.createDataFrame([["Apache spark logistic regression "]]) >>> pm = p.fit(d) >>> np = Pipeline().setStages([pm]) >>> npm = np.fit(d) >>> npm.write().save('./npm_test') scala side : scala> import org.apache.spark.ml.PipelineModel import org.apache.spark.ml.PipelineModel scala> val pp = PipelineModel.load("./npm_test") pp: org.apache.spark.ml.PipelineModel = PipelineModel_4d879f6b2b02c8d3d467 > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > ---------------------------------------------------------------------------- > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.4.3 > Reporter: Saif Addin > Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org