Ivan Bessonov created IGNITE-19395:
--------------------------------------
Summary: Reduce write amplification for RocksDB partition storage
Key: IGNITE-19395
URL: https://issues.apache.org/jira/browse/IGNITE-19395
Project: Ignite
Issue Type: Improvement
Reporter: Ivan Bessonov
Currently, the "commit" operation in rocksdb storage looks like this:
{code:java}
val data = db.read(writeIntentKey);
db.remove(writeIntentKey);
db.write(committedKey, data);{code}
This is wasteful, we end up writing everything twice. There's another solution,
we may add a level of indirection to the data:
{code:java}
// RowId index.
[ TableId?? | PartId | RowId | Timestamp ] -> [ DataId ]
[ TableId?? | PartId | RowId ] -> [ DataId | TxId | CommitTableId |
CommitPartId ]
// Data.
[ DataId ] -> [ Payload ]{code}
{{DataId}} must be unique. I don't like the idea of auto-incrementing key we
should always persist latest value), there must be another way.
The main idea is that DataId doesn't change while committing the data, meaning
that it can be generated using RowId and TxId.
For example, {{RowId ++ beginTimestamp(TxId)}} seems like a unique value (with
mandatory partition ID prefix and probably a table ID prefix)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)