[
https://issues.apache.org/jira/browse/CASSANDRA-20280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Konstantinov updated CASSANDRA-20280:
--------------------------------------------
Attachment: (was: image-2025-02-01-15-08-31-205.png)
> More compact native memory layout for NativeCell
> ------------------------------------------------
>
> Key: CASSANDRA-20280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20280
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Memtable
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Attachments: image-2025-02-01-15-10-15-176.png,
> image-2025-02-01-15-21-22-653.png
>
>
> To capture an idea here, I going to return to it after finishing with the
> current in-progress tickets.
> The current NativeCell has the following native memory layout:
> So, when we store an integer value (4 bytes) we have total size 25 bytes.
> For an ASCII string with 10 symbols - 31 bytes.
> If we can make it more compact -> more data can be stored in memtables and
> with a potentially better cache locality.
> The idea is to use the first byte to store more flags to differentiate
> typical use cases:
> # A usual cell without TTL
> # Alive: we do not need to store localDeletionInfo
> # Value is frequently a small value and if it is not more than 128 bytes, we
> can use 1 byte to store length (varint is an alternative but it is harder to
> calculate offset for data after it)
> So, we can introduce flags in the first byte such as:
> * has path
> * has TTL
> * has delete info
> * one byte length
> When we read some component of the cell - we read flags and calculate offsets
> for the component using information from the flags
> For example:
> !image-2025-02-01-15-10-15-176.png|width=400!
> These changes are local and incapsulated in NativeCell logic.
> Additional but more complicated options:
> * to reduce the size is to use a delta encoding for timestamp (similar to
> what we have in SSTables) but it is more complicated logic which will require
> to store somewhere (on a partition or memtable level) base timestamp and
> store delta (as a short or int) in the cell if the delta is small enough
> (timestamp technically can be set by a client to strange value, so we have
> support still an option with a full long).
> * if timestamp and LocalDelInfo has the same values (taking in account micro
> to milliseconds conversion) we can use another flag bit to mark it and store
> only timestamp and calculate LocalDelInfo from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]