Ivan Bessonov created IGNITE-18031:
--------------------------------------
Summary: Implement background GC process for MV partition storages
Key: IGNITE-18031
URL: https://issues.apache.org/jira/browse/IGNITE-18031
Project: Ignite
Issue Type: Improvement
Reporter: Ivan Bessonov
Please refer to Epic for more details. Here I only describe thoughts about
background GC process.
h3. General thoughts
Basic algorithm is the following:
{code:java}
RowId rowId = null;
while (!partitionStopped) {
if (rowId == null) rowId = minRowId(partId);
rowId = partition.closestRowId(rowId);
partition.gc(rowId);
rowId = rodId.increment();
}{code}
Here I ignore a lot of technicalities and only show the main loop. We _could_
implement it literally the same way, but that would be a very bad decision. Why:
* it fully utilizes a single thread. That would mean a thread per each
partition, which is unacceptable
* it constantly reads the entire partition over and over again. I don't like
that, it's a waste of resources, we'd rather prioritize reading data that user
needs. There should be pauses between full runs, at least
To address all of these issues, the job should be split into small batches
(like we do in many other places) and every new batch should be put into a pool
only when the current one is completed. This allows multiple partitions to
utilize the same pool without a possibility of starvation.
The part with the _pause_ should probably involve a scheduled pool. That seems
enough for the first implementation.
Other ideas and approaches will be discussed in other issues.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)