Ashwani Raina created KUDU-3734:
-----------------------------------

             Summary: Rowset merge compaction does not consider undo delta size 
while picking rowsets
                 Key: KUDU-3734
                 URL: https://issues.apache.org/jira/browse/KUDU-3734
             Project: Kudu
          Issue Type: Bug
            Reporter: Ashwani Raina
            Assignee: Ashwani Raina


Rowset merge compaction has preliminary stage where it goes through all the 
eligible rowsets in a tablet that fit the criteria of optimal compaction 
w.r.t., budget, rowset width, range overlaps, etc.

It uses fractional knapsack algorithm to come up with the list of rowsets that 
can deliver maximum bang for the buck. While a rowset merge compaction touches 
almost all the parts of a data pertaining to the rows within a tablet, e.g., 
base data, redo and undo deltas, it doesn't take into consideration the size of 
undo deltas while calculating density of a rowset item in knapsack algorithm.

Due to this, even if a rowset has undo deltas amount to huge size, it is 
possible that it will get picked up for rowset merge compaction. This can lead 
to OOM scenarios if say size of undo deltas is GBs and when compaction finally 
starts reading the rows and corresponding uncompressed data from deltas into 
memory, the process might cross the memory hard limit, thereby forcing OS to 
take action and kill the kudu service. One such example is KUDU-3406.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to