Aleksey Ponkin created SPARK-18252:
--------------------------------------
Summary: Compressed BloomFilters
Key: SPARK-18252
URL: https://issues.apache.org/jira/browse/SPARK-18252
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.0.1
Reporter: Aleksey Ponkin
Priority: Minor
Since version 2.0 Spark has BloomFilter implementation -
org.apache.spark.util.sketch.BloomFilterImpl. I have noticed that current
implementation are using custom class org.apache.spark.util.sketch.BitArray,
which are allocating memory for filter in the begining. So even filters with
small number of elements inserted will be preatty large when there will be a
need of serialization. Is there any interest to use
[https://github.com/RoaringBitmap/RoaringBitmap][RoaringBitmap] or
[javaewah][https://github.com/lemire/javaewah] to compress bloom filters or may
be compress them during serialization stage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]