[ 
https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822724#comment-16822724
 ] 

Liang-Chi Hsieh commented on SPARK-27367:
-----------------------------------------

I changed spark code to use the new API when upgrading the new version of 
roaring bitmap.

The size of the bitmap is also related to spareness and distribution of empty 
blocks. I don't have real loading to produce big bitmap. So I manually created 
a HighlyCompressedMapStatus and benchmarked serializing/deserializing of the 
bitmap inside. I use a pretty big block sizes array to the 
HighlyCompressedMapStatus. I think we don't set such number of partitions 
(100000000) on the reduce side. With this bitmap, I can see a little 
performance difference (9ms v.s. 6ms) between old and new serde API.

{code}
val conf = new SparkConf(false)
conf.set(KRYO_REGISTRATION_REQUIRED, true)
val ser = new KryoSerializer(conf).newInstance()

val blockSizes = (0L until 100000000L).map { i =>
  if (i % 2 == 0) {
    0L
  } else {
    i
  }
}.toArray
val serialized = 
ser.serialize(HighlyCompressedMapStatus(BlockManagerId("exec-1", "host", 1234), 
blockSizes))
ser.deserialize(serialized)
{code}





> Faster RoaringBitmap Serialization with v0.8.0
> ----------------------------------------------
>
>                 Key: SPARK-27367
>                 URL: https://issues.apache.org/jira/browse/SPARK-27367
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Imran Rashid
>            Priority: Major
>
> RoaringBitmap 0.8.0 adds faster serde, but also requires us to change how we 
> call the serde routines slightly to take advantage of it.  This is probably a 
> worthwhile optimization as the every shuffle map task with a large # of 
> partitions generates these bitmaps, and the driver especially has to 
> deserialize many of these messages.
> See 
> * https://github.com/apache/spark/pull/24264#issuecomment-479675572
> * https://github.com/RoaringBitmap/RoaringBitmap/pull/325
> * https://github.com/RoaringBitmap/RoaringBitmap/issues/319



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to