Zoltán Borók-Nagy created IMPALA-13109:
------------------------------------------

             Summary: Use RoaringBitmap in IcebergDeleteNode
                 Key: IMPALA-13109
                 URL: https://issues.apache.org/jira/browse/IMPALA-13109
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Zoltán Borók-Nagy


IcebergDeleteNode currently uses an ordered int64_t array for each data file to 
hold the deleted positions. This can consume significant amount of memory when 
there are lots of deleted records.
E.g. 100 Million delete records consume 800 MiB memory.

RoaringBitmap is a highly compressed and highly efficient data structure to 
store bitmaps:
[https://arxiv.org/pdf/1603.06549]
[https://github.com/RoaringBitmap/CRoaring]

We could use it to store the deleted file positions instead of the sorted 
arrays, as
 * it consumes significantly less memory
 * makes the code simpler
 * *_might_* have perf benefits



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to