[ 
https://issues.apache.org/jira/browse/SPARK-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950651#comment-14950651
 ] 

Charles Allen commented on SPARK-11016:
---------------------------------------

[~srowen] As mentioned in 
https://issues.apache.org/jira/browse/SPARK-5949?focusedCommentId=14949819&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14949819
 spark is relying on native Kryo serde for RoaringBitmap stuff in 
KryoSerializer: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala#L368
 including the protected Element class: 
https://github.com/lemire/RoaringBitmap/blob/RoaringBitmap-0.4.5/src/main/java/org/roaringbitmap/RoaringArray.java#L361
 which was removed in 0.5.0 and later (Spark is on 0.4.5 currently)

The SerDe method sanctioned by the RoaringBitmap library is to use the 
serialize and deserialize methods provided by the RoaringBitmap or RoaringArray 
object. Access to a protected class causes conflicts if a 0.5.0 or later 
version of the RoaringBitmap library is used because Spark will unavoidably 
fail when it tries to register everything in 
org.apache.spark.serializer.KryoSerializer#toRegister , including the 
no-longer-existing protected inner static class

I did a quick jab at a patch locally by registering RoaringBitmap and 
RoaringArray with a com.esotericsoftware.kryo.Serializer, but it is not clear 
how close KryoInput and KryoOutput are to DataInput / DataOutput, which means a 
bridging approach might violate the contract of one or the other.

> Spark fails when running with a task that requires a more recent version of 
> RoaringBitmaps
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-11016
>                 URL: https://issues.apache.org/jira/browse/SPARK-11016
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.4.0
>            Reporter: Charles Allen
>
> The following error appears during Kryo init whenever a more recent version 
> (>0.5.0) of Roaring bitmaps is required by a job. 
> org/roaringbitmap/RoaringArray$Element was removed in 0.5.0
> {code}
> A needed class was not found. This could be due to an error in your runpath. 
> Missing class: org/roaringbitmap/RoaringArray$Element
> java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element
>       at 
> org.apache.spark.serializer.KryoSerializer$.<init>(KryoSerializer.scala:338)
>       at 
> org.apache.spark.serializer.KryoSerializer$.<clinit>(KryoSerializer.scala)
>       at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93)
>       at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237)
>       at 
> org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:222)
>       at 
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138)
>       at 
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201)
>       at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
>       at 
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
>       at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>       at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
>       at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318)
>       at 
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006)
>       at 
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>       at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>       at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003)
>       at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818)
>       at 
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>       at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>       at org.apache.spark.SparkContext.textFile(SparkContext.scala:816)
> {code}
> See https://issues.apache.org/jira/browse/SPARK-5949 for related info



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to