Alexey Goncharuk created IGNITE-12263:
-----------------------------------------
Summary: Introduce native persistence compaction operation
Key: IGNITE-12263
URL: https://issues.apache.org/jira/browse/IGNITE-12263
Project: Ignite
Issue Type: Improvement
Reporter: Alexey Goncharuk
Currently, Ignite native persistence does not shrink storage files after
key-value pairs are removed.
The causes of this behavior are:
* The absence of a mechanism that allows Ignite to track highest non-empty
page position in a partition file
* The absence of a mechanism which allows Ignite to select a page closest to
the file beginning for write
* The absence of a mechanism which allows Ignite to move a key-value pair from
page to page during defragmentation
As an initial change I suggest to introduce a new node startup mode, which will
run a defragmentation procedure allowing the node to shrink storage files. The
procedure will not mutate the logical state of a partition allowing further
historical rebalance to quickly catch up the node. Since the procedure will run
during the node startup (during the final stages of recovery), there will be no
concurrent load, thus the entries can be freely moved from page to page with no
tricky synchronization.
If a procedure is applied during the whole cluster restart, then all nodes will
be defragmented simultaneously, allowing for a quicker parallel defragmentation
at a cost of downtime.
The procedure should accept an optional list of cache groups to defragment to
allow arbitrary cache group selection for defragmentation.
An idea of the actions taken during the run for each partition selected for
defragmentation:
* Partition pages are preloaded to memory if possible to avoid excessive page
replacement. During the scan, a HWM of the written data is detected (empty
pages are skipped)
* Pages references in a free list are sorted in a way allowing to pick pages
closest to the file start
* The partition is scanned in reverse order, key-value pairs are moved closer
to the file start, HWM is updated accordingly. This step is particularly open
for various optimizations because different strategies will work well for
different fragmentation patterns.
* After the scan iteration is completed, the file size can be updated
according to the HWM
As a further improvement, this partition defragmentation procedure can be later
run in online mode, after proper cache update protocol changes are designed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)