Alexey Serbin created KUDU-3429:
-----------------------------------
Summary: Refactor CompactRowSetsOp to run on a pre-determined
memory budget
Key: KUDU-3429
URL: https://issues.apache.org/jira/browse/KUDU-3429
Project: Kudu
Issue Type: Improvement
Reporter: Alexey Serbin
[KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory
budgeting for running CompactRowSetsOp maintenance operations. On its nature,
that provides an interim approach adding memory budgeting on top of the current
CompactRowSetsOp implementation as-is.
Ideally, the implementation of CompactRowSetsOp should be refactored to merge
the deltas in participating rowsets sequentially, chunk by chunk, persisting
the results and allocating memory just for small bunch of processed deltas, not
loading all the deltas at once.
This JIRA item is to track the work in the context outlined above.
Below are a key points to address in this scope:
* even if it's a merge-like operation by its nature, the current implementation
of CompactRowSetsOp allocates all the memory necessary to load the UNDO deltas
at once, and it keeps all the preliminary results in the memory as well before
persisting the result data to disk
* the current implementation of CompactRowSetsOp loads all the UNDO deltas from
the rowsets selected for compaction regardless whether they are ancient or not;
it discards of the data sourced from the ancient deltas in the very end before
persisting the result data
Also, while keeping memory usage on a predetermined budget, the new
implementation for CompactRowSetsOp should strive to avoid IO multiplication as
much as possible.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)