[
https://issues.apache.org/jira/browse/HIVEMALL-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400009#comment-16400009
]
Makoto Yui commented on HIVEMALL-60:
------------------------------------
[~maropu] What's the progress?
> java.io.NotSerializableException if you call each_top_k by using an internal
> API
> --------------------------------------------------------------------------------
>
> Key: HIVEMALL-60
> URL: https://issues.apache.org/jira/browse/HIVEMALL-60
> Project: Hivemall
> Issue Type: Bug
> Reporter: Takeshi Yamamuro
> Priority: Major
> Labels: Spark
>
> If you say code below, you get an exception;
> {code}
> val df = spark.range(10).selectExpr(s"id % 3 AS key", "rand() AS x", "CAST(id
> AS STRING) AS value")
> val resultDf = df.each_top_k(lit(100), $"x".as("score"), $"key")
> // Run these operations above by kicking an inernal API
> resultDf.queryExecution.executedPlan.execute().foreach(x => {})
> Caused by: java.io.NotSerializableException:
> scala.collection.Iterator$$anon$12
> Serialization stack:
> - object not serializable (class: scala.collection.Iterator$$anon$12,
> value: empty iterator)
> - field (class: scala.collection.Iterator$$anonfun$toStream$1, name:
> $outer, type: interface scala.collection.Iterator)
> - object (class scala.collection.Iterator$$anonfun$toStream$1,
> <function0>)
> - writeObject data (class:
> scala.collection.immutable.List$SerializationProxy)
> - object (class scala.collection.immutable.List$SerializationProxy,
> scala.collection.immutable.List$SerializationProxy@4c4ec306)
> - writeReplace data (class:
> scala.collection.immutable.List$SerializationProxy)
> - object (class scala.collection.immutable.$colon$colon,
> List(org.apache.spark.OneToOneDependency@434fbf49))
> - field (class: org.apache.spark.rdd.RDD, name:
> org$apache$spark$rdd$RDD$$dependencies_, type: interface scala.collection.Seq)
> - object (class org.apache.spark.rdd.MapPartitionsRDD,
> MapPartitionsRDD[7] at execute at <console>:31)
> - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
> - object (class scala.Tuple2, (MapPartitionsRDD[7] at execute at
> <console>:31,<function2>))
> at
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
> at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> {code}
> In most cases, users do not call this in this way though, it'd better to fix
> this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)