Zoltán Borók-Nagy created IMPALA-13109:
------------------------------------------
Summary: Use RoaringBitmap in IcebergDeleteNode
Key: IMPALA-13109
URL: https://issues.apache.org/jira/browse/IMPALA-13109
Project: IMPALA
Issue Type: Improvement
Reporter: Zoltán Borók-Nagy
IcebergDeleteNode currently uses an ordered int64_t array for each data file to
hold the deleted positions. This can consume significant amount of memory when
there are lots of deleted records.
E.g. 100 Million delete records consume 800 MiB memory.
RoaringBitmap is a highly compressed and highly efficient data structure to
store bitmaps:
[https://arxiv.org/pdf/1603.06549]
[https://github.com/RoaringBitmap/CRoaring]
We could use it to store the deleted file positions instead of the sorted
arrays, as
* it consumes significantly less memory
* makes the code simpler
* *_might_* have perf benefits
--
This message was sent by Atlassian Jira
(v8.20.10#820010)