Aleksey Ponkin created SPARK-18252:
--------------------------------------

             Summary: Compressed BloomFilters
                 Key: SPARK-18252
                 URL: https://issues.apache.org/jira/browse/SPARK-18252
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.0.1
            Reporter: Aleksey Ponkin
            Priority: Minor


Since version 2.0 Spark has BloomFilter implementation - 
org.apache.spark.util.sketch.BloomFilterImpl. I have noticed that current 
implementation are using custom class org.apache.spark.util.sketch.BitArray, 
which are allocating memory for filter in the begining. So even filters with 
small number of elements inserted will be preatty large when there will be a 
need of serialization. Is there any interest to use 
[https://github.com/RoaringBitmap/RoaringBitmap][RoaringBitmap] or 
[javaewah][https://github.com/lemire/javaewah] to compress bloom filters or may 
be compress them during serialization stage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to