tkalkirill commented on code in PR #1619: URL: https://github.com/apache/ignite-3/pull/1619#discussion_r1097033780
########## modules/storage-rocksdb/docs/garbage-collection.md: ########## @@ -0,0 +1,83 @@ +# Garbage Collection in the RocksDB partition storage + +## Garbage Collection queue + +We store garbage collector's queue in the RocksDB column family in the following +format. The key: + +| Partition id | Timestamp | Row id | +|--------------|-------------------------------------------|----------------| +| 2-byte | 12-byte (8-byte physical, 4-byte logical) | 16-byte (uuid) | + +The value is not stored, as we only need the key. We can make row id the value, +because for the ascending order processing of the queue we only need the timestamp, +however, multiple row ids can have same timestamp, so making row id a value requires storing a list of +row ids, hence the commit in this implementation of the storage becomes more sophisticated and, probably, +less performant. + +Each time a row is being committed to the storage, we perform a check whether +there is already a value for this row. If there is one and both it and new version are not tombstones, we put +new commit's timestamp and row id into the GC queue. To understand why we only put new value's timestamp +please refer to the Garbage Collection [algorithm](#garbage-collection-algorithm). +The queue is updated along with the data column family in a single batch and is destroyed when the storage +is being cleared or destroyed. + +## Garbage Collection algorithm + +It's important to understand when we actually need to perform garbage collection. + +Consider the following example: +*Note that **Record number** is a hypothetical value that helps referring to the specific entries, there +is no such value in the storage.* + +| Record number | Row id | Timestamp | +|---------------|--------|-----------| +| 1 | Foo | 1 | +| 2 | Foo | 10 | + +In this case, we can only remove record 1 if the low watermark is 10 or higher. If watermark is at 9, +then it means that there can still occur a transaction with a 9 timestamp, which means that the record number 1 +is still needed. Review Comment: You are right! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
