[ 
https://issues.apache.org/jira/browse/KUDU-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060226#comment-18060226
 ] 

Ashwani Raina commented on KUDU-3734:
-------------------------------------

Ticket to verify regression via a test is tracked here: 
https://issues.apache.org/jira/browse/KUDU-3741

 

> Rowset merge compaction does not consider undo delta size while picking 
> rowsets
> -------------------------------------------------------------------------------
>
>                 Key: KUDU-3734
>                 URL: https://issues.apache.org/jira/browse/KUDU-3734
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Ashwani Raina
>            Assignee: Ashwani Raina
>            Priority: Major
>
> Rowset merge compaction has preliminary stage where it goes through all the 
> eligible rowsets in a tablet that fit the criteria of optimal compaction 
> w.r.t., budget, rowset width, range overlaps, etc.
> It uses fractional knapsack algorithm to come up with the list of rowsets 
> that can deliver maximum bang for the buck. While a rowset merge compaction 
> touches almost all the parts of a data pertaining to the rows within a 
> tablet, e.g., base data, redo and undo deltas, it doesn't take into 
> consideration the size of undo deltas while calculating density of a rowset 
> item in knapsack algorithm.
> Due to this, even if a rowset has undo deltas amount to huge size, it is 
> possible that it will get picked up for rowset merge compaction. This can 
> lead to OOM scenarios if say size of undo deltas is GBs and when compaction 
> finally starts reading the rows and corresponding uncompressed data from 
> deltas into memory, the process might cross the memory hard limit, thereby 
> forcing OS to take action and kill the kudu service. One such example is 
> KUDU-3406.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to