[
https://issues.apache.org/jira/browse/IMPALA-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy resolved IMPALA-13109.
----------------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
> Use RoaringBitmap in IcebergDeleteNode
> --------------------------------------
>
> Key: IMPALA-13109
> URL: https://issues.apache.org/jira/browse/IMPALA-13109
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> IcebergDeleteNode currently uses an ordered int64_t array for each data file
> to hold the deleted positions. This can consume significant amount of memory
> when there are lots of deleted records.
> E.g. 100 Million delete records consume 800 MiB memory.
> RoaringBitmap is a highly compressed and highly efficient data structure to
> store bitmaps:
> [https://arxiv.org/pdf/1603.06549]
> [https://github.com/RoaringBitmap/CRoaring]
> We could use it to store the deleted file positions instead of the sorted
> arrays, as
> * it consumes significantly less memory
> * makes the code simpler
> * *_might_* have perf benefits
--
This message was sent by Atlassian Jira
(v8.20.10#820010)