[ 
https://issues.apache.org/jira/browse/HIVEMALL-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400009#comment-16400009
 ] 

Makoto Yui commented on HIVEMALL-60:
------------------------------------

[~maropu] What's the progress?

> java.io.NotSerializableException if you call each_top_k by using an internal 
> API
> --------------------------------------------------------------------------------
>
>                 Key: HIVEMALL-60
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-60
>             Project: Hivemall
>          Issue Type: Bug
>            Reporter: Takeshi Yamamuro
>            Priority: Major
>              Labels: Spark
>
> If you say code below, you get an exception;
> {code}
> val df = spark.range(10).selectExpr(s"id % 3 AS key", "rand() AS x", "CAST(id 
> AS STRING) AS value")
> val resultDf = df.each_top_k(lit(100), $"x".as("score"), $"key")
> // Run these operations above by kicking an inernal API
> resultDf.queryExecution.executedPlan.execute().foreach(x => {})
> Caused by: java.io.NotSerializableException: 
> scala.collection.Iterator$$anon$12
> Serialization stack:
>         - object not serializable (class: scala.collection.Iterator$$anon$12, 
> value: empty iterator)
>         - field (class: scala.collection.Iterator$$anonfun$toStream$1, name: 
> $outer, type: interface scala.collection.Iterator)
>         - object (class scala.collection.Iterator$$anonfun$toStream$1, 
> <function0>)
>         - writeObject data (class: 
> scala.collection.immutable.List$SerializationProxy)
>         - object (class scala.collection.immutable.List$SerializationProxy, 
> scala.collection.immutable.List$SerializationProxy@4c4ec306)
>         - writeReplace data (class: 
> scala.collection.immutable.List$SerializationProxy)
>         - object (class scala.collection.immutable.$colon$colon, 
> List(org.apache.spark.OneToOneDependency@434fbf49))
>         - field (class: org.apache.spark.rdd.RDD, name: 
> org$apache$spark$rdd$RDD$$dependencies_, type: interface scala.collection.Seq)
>         - object (class org.apache.spark.rdd.MapPartitionsRDD, 
> MapPartitionsRDD[7] at execute at <console>:31)
>         - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
>         - object (class scala.Tuple2, (MapPartitionsRDD[7] at execute at 
> <console>:31,<function2>))
>   at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> {code}
> In most cases, users do not call this in this way though, it'd better to fix 
> this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to