Alexey Serbin created KUDU-3429:
-----------------------------------

             Summary: Refactor CompactRowSetsOp to run on a pre-determined 
memory budget 
                 Key: KUDU-3429
                 URL: https://issues.apache.org/jira/browse/KUDU-3429
             Project: Kudu
          Issue Type: Improvement
            Reporter: Alexey Serbin


[KUDU-3406|https://issues.apache.org/jira/browse/KUDU-3406] added memory 
budgeting for running CompactRowSetsOp maintenance operations.  On its nature, 
that provides an interim approach adding memory budgeting on top of the current 
CompactRowSetsOp implementation as-is.

Ideally, the implementation of CompactRowSetsOp should be refactored to merge 
the deltas in participating rowsets sequentially, chunk by chunk, persisting 
the results and allocating memory just for small bunch of processed deltas, not 
loading all the deltas at once.

This JIRA item is to track the work in the context outlined above.

Below are a key points to address in this scope:
* even if it's a merge-like operation by its nature, the current implementation 
of CompactRowSetsOp allocates all the memory necessary to load the UNDO deltas 
at once, and it keeps all the preliminary results in the memory as well before 
persisting the result data to disk
* the current implementation of CompactRowSetsOp loads all the UNDO deltas from 
the rowsets selected for compaction regardless whether they are ancient or not; 
it discards of the data sourced from the ancient deltas in the very end before 
persisting the result data

Also, while keeping memory usage on a predetermined budget, the new 
implementation for CompactRowSetsOp should strive to avoid IO multiplication as 
much as possible.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to