Takeshi Yamamuro created HIVEMALL-60:
----------------------------------------
Summary: java.io.NotSerializableException if you call each_top_k
by using an internal API
Key: HIVEMALL-60
URL: https://issues.apache.org/jira/browse/HIVEMALL-60
Project: Hivemall
Issue Type: Bug
Reporter: Takeshi Yamamuro
If you say code below, you get an exception;
{code}
val df = spark.range(10).selectExpr(s"id % 3 AS key", "rand() AS x", "CAST(id
AS STRING) AS value")
val resultDf = df.each_top_k(lit(100), $"x".as("score"), $"key")
// Run these operations above by kicking an inernal API
resultDf.queryExecution.executedPlan.execute().foreach(x => {})
Caused by: java.io.NotSerializableException: scala.collection.Iterator$$anon$12
Serialization stack:
- object not serializable (class: scala.collection.Iterator$$anon$12,
value: empty iterator)
- field (class: scala.collection.Iterator$$anonfun$toStream$1, name:
$outer, type: interface scala.collection.Iterator)
- object (class scala.collection.Iterator$$anonfun$toStream$1,
<function0>)
- writeObject data (class:
scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.List$SerializationProxy,
scala.collection.immutable.List$SerializationProxy@4c4ec306)
- writeReplace data (class:
scala.collection.immutable.List$SerializationProxy)
- object (class scala.collection.immutable.$colon$colon,
List(org.apache.spark.OneToOneDependency@434fbf49))
- field (class: org.apache.spark.rdd.RDD, name:
org$apache$spark$rdd$RDD$$dependencies_, type: interface scala.collection.Seq)
- object (class org.apache.spark.rdd.MapPartitionsRDD,
MapPartitionsRDD[7] at execute at <console>:31)
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (MapPartitionsRDD[7] at execute at
<console>:31,<function2>))
at
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
{code}
In most cases, users do not call this in this way though, it'd better to fix
this.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)