[
https://issues.apache.org/jira/browse/KUDU-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin reassigned KUDU-3429:
-----------------------------------
Assignee: Alexey Serbin
> Refactor CompactRowSetsOp to run on a pre-determined memory budget
> -------------------------------------------------------------------
>
> Key: KUDU-3429
> URL: https://issues.apache.org/jira/browse/KUDU-3429
> Project: Kudu
> Issue Type: Improvement
> Reporter: Alexey Serbin
> Assignee: Alexey Serbin
> Priority: Major
>
> [KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory
> budgeting for running CompactRowSetsOp maintenance operations. On its
> nature, that provides an interim approach adding memory budgeting on top of
> the current CompactRowSetsOp implementation as-is.
> Ideally, the implementation of CompactRowSetsOp should be refactored to merge
> the deltas in participating rowsets sequentially, chunk by chunk, persisting
> the results and allocating memory just for small bunch of processed deltas,
> not loading all the deltas at once.
> This JIRA item is to track the work in the context outlined above.
> Key points to address in this scope:
> * even if it's a merge-like operation by its nature, the current
> implementation of CompactRowSetsOp allocates all the memory necessary to load
> the UNDO deltas at once, and it keeps all the preliminary results in the
> memory as well before persisting the result data to disk
> * the current implementation of CompactRowSetsOp loads all the UNDO deltas
> from the rowsets selected for compaction regardless whether they are ancient
> or not; it discards of the data sourced from the ancient deltas in the very
> end before persisting the result data
> Also, while keeping memory usage on a predetermined budget, the new
> implementation for CompactRowSetsOp should strive to avoid IO multiplication
> as much as possible.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)