[jira] [Resolved] (IGNITE-18113) Implement a rows queue for GC in MV storages

Kirill Tkalenko (Jira) Fri, 24 Feb 2023 01:59:04 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-18113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kirill Tkalenko resolved IGNITE-18113.
--------------------------------------
    Resolution: Won't Do

No need to implement, each store contains its own garbage collection queue.

> Implement a rows queue for GC in MV storages
> --------------------------------------------
>
>                 Key: IGNITE-18113
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18113
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Assignee: Kirill Tkalenko
>            Priority: Major
>              Labels: ignite-3
>
> Please refer to the Epic for the prerequisites. Please also refer to 
> IGNITE-18031 and IGNITE-18033 for some thoughts on the topic.
> IGNITE-18031 is proposed for the implementation in the assumption that it's 
> easy to implement and then run tests without fear of disk overflow.
> But, this may not be the case and there's a chance that it should actually be 
> closed in favor of the current issue. In depends on how much time we're 
> willing to spend on the implementation right now and how much simpler the 
> implementation of the naive approach actually is. We'll evaluate it later.
> EDIT: that idea is abandoned, queue is the right way.
> h3. Algorithm idea description
> First of all, why the naive approach is bad. The reason is simple - it 
> requires constant reads from the device. This leads to both unnecessary disk 
> operations and unnecessary page buffer cache invalidation, independent from 
> the engine. All engines have a limited amount of memory to work with. 
> Generally speaking, it's significantly less that the amount of data on disk.
> Another approach is to have a queue. As already stated, it's very similar to 
> the way TTL is implemented in Ignite 2.x. Every time there's a commit of the 
> row, we add a {{(commitTs, rowId)}} pair into the queue. Every time we want 
> to evict something, we just iterate all pairs until a certain timestamp and 
> delete these entries from both a queue and the row storage.
>  * Pros: faster deletion, better utilization of the resources
>  * Cons: slower insertions
> Now, why aren't slow insertions a bad thing? For me personally this is a 
> trickн question to answer. The way I see it - writes are inherently slow, 
> there's not much we can do about it.
> Meta information must be updated, indexes must be updated, current outgoing 
> rebalance shapshots must be taken into account. There's even a chance of 
> write intent resolution taking place, which is weird but we allow it. It's 
> not necessarily instantaneous.
> Having a single additional insert, which is almost always an append (which 
> may in theory decrease its run time, but that's kinda hard to measure), won't 
> harm. Especially considering that we're going to perform partial query 
> cleanup in the same thread later (but after completing raft done closure, I 
> guess, we shouldn't block the leader from returning the result to the 
> client). 
> h3. Problems & solutions
> In order for the queue to be queue, we should always delete the head when we 
> processed it. What's the point otherwise.
> If we implement the GC without proper thinking we may have problems. For 
> example, imagine having a row with a single version that's very old and 
> nobody changes it, ever. GC shouldn't delete this version of the row, instead 
> it would re-insert it back to the queue with slightly larger timestamp. This 
> is bad. Such behavior guarantees constant useless GC work on idle cluster, 
> that's not what we want.
> The way to avoid this problem _is to never store the timestamp of the oldest 
> version_ in a queue. This way the algorithm may become the following:
> {code:java}
> Pair<Timestamp, RowId> pair = queue.poll();
> if (pair.getTimestamp() > lowWatermark) {
>   return; // queue is exhausted, idle
> }
> // Just do it. Delete everything below the threshold
> storage.deleteVersionsBelowTimestamp(pair.getRowId(), 
> pair.getTimestamp());{code}
> There's an interesting situation with tombstones. This one's tricky. We have 
> two options:
>  * GC deletes tombstones when they become the oldest known version in the 
> chain
>  * CG never deletes tombstones
> If we go the first route, we have to remember that it also _creates_ a new 
> oldest version that _must_ be deleted from the queue to preserve structure 
> data consistency.
> If we go the second route, we may end up with stale tombstones that will 
> never be deleted.
> First option is better, because it actually works, but it's harder to 
> implement.
> EDIT: poll() should also delete data at the same time, or return null if 
> timestamp restriction isn't met.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (IGNITE-18113) Implement a rows queue for GC in MV storages

Reply via email to