[ https://issues.apache.org/jira/browse/SPARK-27367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822724#comment-16822724 ]
Liang-Chi Hsieh commented on SPARK-27367: ----------------------------------------- I changed spark code to use the new API when upgrading the new version of roaring bitmap. The size of the bitmap is also related to spareness and distribution of empty blocks. I don't have real loading to produce big bitmap. So I manually created a HighlyCompressedMapStatus and benchmarked serializing/deserializing of the bitmap inside. I use a pretty big block sizes array to the HighlyCompressedMapStatus. I think we don't set such number of partitions (100000000) on the reduce side. With this bitmap, I can see a little performance difference (9ms v.s. 6ms) between old and new serde API. {code} val conf = new SparkConf(false) conf.set(KRYO_REGISTRATION_REQUIRED, true) val ser = new KryoSerializer(conf).newInstance() val blockSizes = (0L until 100000000L).map { i => if (i % 2 == 0) { 0L } else { i } }.toArray val serialized = ser.serialize(HighlyCompressedMapStatus(BlockManagerId("exec-1", "host", 1234), blockSizes)) ser.deserialize(serialized) {code} > Faster RoaringBitmap Serialization with v0.8.0 > ---------------------------------------------- > > Key: SPARK-27367 > URL: https://issues.apache.org/jira/browse/SPARK-27367 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Imran Rashid > Priority: Major > > RoaringBitmap 0.8.0 adds faster serde, but also requires us to change how we > call the serde routines slightly to take advantage of it. This is probably a > worthwhile optimization as the every shuffle map task with a large # of > partitions generates these bitmaps, and the driver especially has to > deserialize many of these messages. > See > * https://github.com/apache/spark/pull/24264#issuecomment-479675572 > * https://github.com/RoaringBitmap/RoaringBitmap/pull/325 > * https://github.com/RoaringBitmap/RoaringBitmap/issues/319 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org