Ashwani Raina created KUDU-3734:
-----------------------------------
Summary: Rowset merge compaction does not consider undo delta size
while picking rowsets
Key: KUDU-3734
URL: https://issues.apache.org/jira/browse/KUDU-3734
Project: Kudu
Issue Type: Bug
Reporter: Ashwani Raina
Assignee: Ashwani Raina
Rowset merge compaction has preliminary stage where it goes through all the
eligible rowsets in a tablet that fit the criteria of optimal compaction
w.r.t., budget, rowset width, range overlaps, etc.
It uses fractional knapsack algorithm to come up with the list of rowsets that
can deliver maximum bang for the buck. While a rowset merge compaction touches
almost all the parts of a data pertaining to the rows within a tablet, e.g.,
base data, redo and undo deltas, it doesn't take into consideration the size of
undo deltas while calculating density of a rowset item in knapsack algorithm.
Due to this, even if a rowset has undo deltas amount to huge size, it is
possible that it will get picked up for rowset merge compaction. This can lead
to OOM scenarios if say size of undo deltas is GBs and when compaction finally
starts reading the rows and corresponding uncompressed data from deltas into
memory, the process might cross the memory hard limit, thereby forcing OS to
take action and kill the kudu service. One such example is KUDU-3406.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)