[
https://issues.apache.org/jira/browse/SPARK-28785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315025#comment-17315025
]
Sergey Grigorev commented on SPARK-28785:
-----------------------------------------
I also see that error times to times on my unit tests, sometimes it's
Vector[Double], sometimes Schema for type B is not supported ... is there a way
to make a stable serializer that is available at compile time?
> Schema for type scala.Array[Float] is not supported
> ---------------------------------------------------
>
> Key: SPARK-28785
> URL: https://issues.apache.org/jira/browse/SPARK-28785
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 2.3.3
> Environment: *Spark version:* 2.3.3
> *OS:* CentOS Linux release 7.3.1611
> *Kernel:* 3.10.0-862.14.4.el7.x86_64
> *Java:* openjdk version "1.8.0_151"
> Reporter: Jigao Fu
> Priority: Major
>
> I use Spark ML to build my application and write some test cases based on the
> examples of
> [https://spark.apache.org/docs/latest/ml-features.html|https://spark.apache.org/docs/latest/ml-features.html#word2vec]
> to test whether my application is compatible with Spark's transformers.
>
> After upgrading my Spark version from 2.1.1 to 2.3.3, something strange
> happened. I train a Pipeline model which contains a Word2Vec transformer and
> save the model into local, most of the time it works pretty well but
> sometimes I get the UnsupportedOperationException error:
>
> Code:
>
> {code:java}
> val data = spark.createDataFrame(Seq(
> (1, "Hi I heard about Spark".split(" ")),
> (2, "I wish Java could use case classes".split(" ")),
> (3, "Logistic regression models are neat".split(" "))
> )).toDF("label", "text")
> // transformers
> val word2Vec = new Word2Vec()
> .setInputCol("text")
> .setOutputCol("result")
> .setVectorSize(3)
> .setMinCount(0)
> val pipeline = new Pipeline().setStages(Array(word2Vec))
> val model = pipeline.fit(data)
> model.write.overwrite.save("./model_data")
> // Then my applicatin will read the model data file...{code}
>
>
> Exception:
>
> {code:java}
> java.lang.UnsupportedOperationException: Schema for type scala.Array[Float]
> is not supported at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:780)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:715)
> at
> scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:714)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.getPath$1(ScalaReflection.scala:173)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:298)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:150)
> at
> scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor(ScalaReflection.scala:150)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1$$anonfun$7.apply(ScalaReflection.scala:386)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1$$anonfun$7.apply(ScalaReflection.scala:380)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
> at scala.collection.immutable.List.foreach(List.scala:381) at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at
> scala.collection.immutable.List.map(List.scala:285) at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:380)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:150)
> at
> scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$class.cleanUpReflectionObjects(ScalaReflection.scala:824)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:39)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor(ScalaReflection.scala:150)
> at
> org.apache.spark.sql.catalyst.ScalaReflection$.deserializerFor(ScalaReflection.scala:138)
> at
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:72)
> at org.apache.spark.sql.Encoders$.product(Encoders.scala:275) at
> org.apache.spark.sql.LowPrioritySQLImplicits$class.newProductEncoder(SQLImplicits.scala:248)
> at
> org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:34) at
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:347)
> at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103) at
> org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$saveImpl$1.apply(Pipeline.scala:254)
> at
> org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$saveImpl$1.apply(Pipeline.scala:253)
> at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at
> org.apache.spark.ml.Pipeline$SharedReadWrite$.saveImpl(Pipeline.scala:253) at
> org.apache.spark.ml.PipelineModel$PipelineModelWriter.saveImpl(Pipeline.scala:338)
> at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103)
> {code}
>
> Any idea how to figure out the root cause of this error? Thanks!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]