My guess is that Kryo specially handles Maps generically or relies on some mechanism that does, and it happens to iterate over all key/values as part of that and of course there aren't actually any key/values in the map. The Java serialization is a much more literal (expensive) field-by-field serialization which works here because there's no special treatment. I think you could register a custom serializer that handles this case. Or work around it in your client code. I know there have been other issues with Kryo and Map because, for example, sometimes a Map in an application is actually some non-serializable wrapper view.
On Wed, Sep 28, 2016 at 3:18 AM, Maciej Szymkiewicz <[email protected]> wrote: > Hi everyone, > > I suspect there is no point in submitting a JIRA to fix this (not a Spark > issue?) but I would like to know if this problem is documented anywhere. > Somehow Kryo is loosing default value during serialization: > > scala> import org.apache.spark.{SparkContext, SparkConf} > import org.apache.spark.{SparkContext, SparkConf} > > scala> val aMap = Map[String, Long]().withDefaultValue(0L) > aMap: scala.collection.immutable.Map[String,Long] = Map() > > scala> aMap("a") > res6: Long = 0 > > scala> val sc = new SparkContext(new > SparkConf().setAppName("bar").set("spark.serializer", > "org.apache.spark.serializer.KryoSerializer")) > > scala> sc.parallelize(Seq(aMap)).map(_("a")).first > 16/09/28 09:13:47 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7) > java.util.NoSuchElementException: key not found: a > > while Java serializer works just fine: > > scala> val sc = new SparkContext(new > SparkConf().setAppName("bar").set("spark.serializer", > "org.apache.spark.serializer.JavaSerializer")) > > scala> sc.parallelize(Seq(aMap)).map(_("a")).first > res9: Long = 0 > > -- > Best regards, > Maciej --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
