[
https://issues.apache.org/jira/browse/KUDU-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060226#comment-18060226
]
Ashwani Raina commented on KUDU-3734:
-------------------------------------
Ticket to verify regression via a test is tracked here:
https://issues.apache.org/jira/browse/KUDU-3741
> Rowset merge compaction does not consider undo delta size while picking
> rowsets
> -------------------------------------------------------------------------------
>
> Key: KUDU-3734
> URL: https://issues.apache.org/jira/browse/KUDU-3734
> Project: Kudu
> Issue Type: Bug
> Reporter: Ashwani Raina
> Assignee: Ashwani Raina
> Priority: Major
>
> Rowset merge compaction has preliminary stage where it goes through all the
> eligible rowsets in a tablet that fit the criteria of optimal compaction
> w.r.t., budget, rowset width, range overlaps, etc.
> It uses fractional knapsack algorithm to come up with the list of rowsets
> that can deliver maximum bang for the buck. While a rowset merge compaction
> touches almost all the parts of a data pertaining to the rows within a
> tablet, e.g., base data, redo and undo deltas, it doesn't take into
> consideration the size of undo deltas while calculating density of a rowset
> item in knapsack algorithm.
> Due to this, even if a rowset has undo deltas amount to huge size, it is
> possible that it will get picked up for rowset merge compaction. This can
> lead to OOM scenarios if say size of undo deltas is GBs and when compaction
> finally starts reading the rows and corresponding uncompressed data from
> deltas into memory, the process might cross the memory hard limit, thereby
> forcing OS to take action and kill the kudu service. One such example is
> KUDU-3406.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)