[
https://issues.apache.org/jira/browse/IGNITE-18113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirill Tkalenko resolved IGNITE-18113.
--------------------------------------
Resolution: Won't Do
No need to implement, each store contains its own garbage collection queue.
> Implement a rows queue for GC in MV storages
> --------------------------------------------
>
> Key: IGNITE-18113
> URL: https://issues.apache.org/jira/browse/IGNITE-18113
> Project: Ignite
> Issue Type: Improvement
> Reporter: Ivan Bessonov
> Assignee: Kirill Tkalenko
> Priority: Major
> Labels: ignite-3
>
> Please refer to the Epic for the prerequisites. Please also refer to
> IGNITE-18031 and IGNITE-18033 for some thoughts on the topic.
> IGNITE-18031 is proposed for the implementation in the assumption that it's
> easy to implement and then run tests without fear of disk overflow.
> But, this may not be the case and there's a chance that it should actually be
> closed in favor of the current issue. In depends on how much time we're
> willing to spend on the implementation right now and how much simpler the
> implementation of the naive approach actually is. We'll evaluate it later.
> EDIT: that idea is abandoned, queue is the right way.
> h3. Algorithm idea description
> First of all, why the naive approach is bad. The reason is simple - it
> requires constant reads from the device. This leads to both unnecessary disk
> operations and unnecessary page buffer cache invalidation, independent from
> the engine. All engines have a limited amount of memory to work with.
> Generally speaking, it's significantly less that the amount of data on disk.
> Another approach is to have a queue. As already stated, it's very similar to
> the way TTL is implemented in Ignite 2.x. Every time there's a commit of the
> row, we add a {{(commitTs, rowId)}} pair into the queue. Every time we want
> to evict something, we just iterate all pairs until a certain timestamp and
> delete these entries from both a queue and the row storage.
> * Pros: faster deletion, better utilization of the resources
> * Cons: slower insertions
> Now, why aren't slow insertions a bad thing? For me personally this is a
> trickн question to answer. The way I see it - writes are inherently slow,
> there's not much we can do about it.
> Meta information must be updated, indexes must be updated, current outgoing
> rebalance shapshots must be taken into account. There's even a chance of
> write intent resolution taking place, which is weird but we allow it. It's
> not necessarily instantaneous.
> Having a single additional insert, which is almost always an append (which
> may in theory decrease its run time, but that's kinda hard to measure), won't
> harm. Especially considering that we're going to perform partial query
> cleanup in the same thread later (but after completing raft done closure, I
> guess, we shouldn't block the leader from returning the result to the
> client).
> h3. Problems & solutions
> In order for the queue to be queue, we should always delete the head when we
> processed it. What's the point otherwise.
> If we implement the GC without proper thinking we may have problems. For
> example, imagine having a row with a single version that's very old and
> nobody changes it, ever. GC shouldn't delete this version of the row, instead
> it would re-insert it back to the queue with slightly larger timestamp. This
> is bad. Such behavior guarantees constant useless GC work on idle cluster,
> that's not what we want.
> The way to avoid this problem _is to never store the timestamp of the oldest
> version_ in a queue. This way the algorithm may become the following:
> {code:java}
> Pair<Timestamp, RowId> pair = queue.poll();
> if (pair.getTimestamp() > lowWatermark) {
> return; // queue is exhausted, idle
> }
> // Just do it. Delete everything below the threshold
> storage.deleteVersionsBelowTimestamp(pair.getRowId(),
> pair.getTimestamp());{code}
> There's an interesting situation with tombstones. This one's tricky. We have
> two options:
> * GC deletes tombstones when they become the oldest known version in the
> chain
> * CG never deletes tombstones
> If we go the first route, we have to remember that it also _creates_ a new
> oldest version that _must_ be deleted from the queue to preserve structure
> data consistency.
> If we go the second route, we may end up with stale tombstones that will
> never be deleted.
> First option is better, because it actually works, but it's harder to
> implement.
> EDIT: poll() should also delete data at the same time, or return null if
> timestamp restriction isn't met.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)