[
https://issues.apache.org/jira/browse/SPARK-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950651#comment-14950651
]
Charles Allen commented on SPARK-11016:
---------------------------------------
[~srowen] As mentioned in
https://issues.apache.org/jira/browse/SPARK-5949?focusedCommentId=14949819&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14949819
spark is relying on native Kryo serde for RoaringBitmap stuff in
KryoSerializer:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala#L368
including the protected Element class:
https://github.com/lemire/RoaringBitmap/blob/RoaringBitmap-0.4.5/src/main/java/org/roaringbitmap/RoaringArray.java#L361
which was removed in 0.5.0 and later (Spark is on 0.4.5 currently)
The SerDe method sanctioned by the RoaringBitmap library is to use the
serialize and deserialize methods provided by the RoaringBitmap or RoaringArray
object. Access to a protected class causes conflicts if a 0.5.0 or later
version of the RoaringBitmap library is used because Spark will unavoidably
fail when it tries to register everything in
org.apache.spark.serializer.KryoSerializer#toRegister , including the
no-longer-existing protected inner static class
I did a quick jab at a patch locally by registering RoaringBitmap and
RoaringArray with a com.esotericsoftware.kryo.Serializer, but it is not clear
how close KryoInput and KryoOutput are to DataInput / DataOutput, which means a
bridging approach might violate the contract of one or the other.
> Spark fails when running with a task that requires a more recent version of
> RoaringBitmaps
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-11016
> URL: https://issues.apache.org/jira/browse/SPARK-11016
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.4.0
> Reporter: Charles Allen
>
> The following error appears during Kryo init whenever a more recent version
> (>0.5.0) of Roaring bitmaps is required by a job.
> org/roaringbitmap/RoaringArray$Element was removed in 0.5.0
> {code}
> A needed class was not found. This could be due to an error in your runpath.
> Missing class: org/roaringbitmap/RoaringArray$Element
> java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element
> at
> org.apache.spark.serializer.KryoSerializer$.<init>(KryoSerializer.scala:338)
> at
> org.apache.spark.serializer.KryoSerializer$.<clinit>(KryoSerializer.scala)
> at
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93)
> at
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237)
> at
> org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:222)
> at
> org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138)
> at
> org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201)
> at
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102)
> at
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85)
> at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
> at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318)
> at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006)
> at
> org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
> at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818)
> at
> org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
> at org.apache.spark.SparkContext.textFile(SparkContext.scala:816)
> {code}
> See https://issues.apache.org/jira/browse/SPARK-5949 for related info
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]