[
https://issues.apache.org/jira/browse/CASSANDRA-20280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Konstantinov updated CASSANDRA-20280:
--------------------------------------------
Attachment: cells_heap_layout.png
> More compact native memory layout for NativeCell
> ------------------------------------------------
>
> Key: CASSANDRA-20280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20280
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Memtable
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 5.x
>
> Attachments: cells_heap_layout.png,
> image-2025-02-01-15-10-15-176.png, image-2025-02-01-15-21-22-653.png
>
>
> To capture an idea here, I going to return to it soon after finishing with
> the current in-progress tickets.
> The current NativeCell has the following native memory layout:
> !image-2025-02-01-15-21-22-653.png|width=400!
> So, when we store an integer value (4 bytes) we have total size 25 bytes.
> For an ASCII string with 10 symbols - 31 bytes.
> If we can make it more compact -> more data can be stored in memtables and
> with a potentially better CPU cache usage on lookup.
> The idea is to use the first byte to store more flags to differentiate
> typical use cases:
> # A usual cell without TTL
> # Alive: we do not need to store localDeletionInfo
> # Value is frequently a small value and if it is not more than 128 bytes
> (256 using unsigned byte), we can use 1 byte to store length (varint is an
> alternative but it is harder to calculate offset for data after it)
> So, we can introduce flags in the first byte such as:
> * has path (the existing one)
> * has TTL
> * has delete info
> * one byte length
> When we read some component of the cell - we read flags and calculate offsets
> for the component using information from the flags
> Using this approach we can reduce native memory overhead to the following
> values in the following typical cases:
> !image-2025-02-01-15-10-15-176.png|width=400!
> These changes are local and incapsulated in NativeCell logic. The downside of
> the approach - extra calculations are needed to lookup a component from a
> NativeCell (read flags, if component is present then calculate an offset
> based on it and read the component by the offset)
> Additional but more complicated options:
> * use a delta encoding for timestamp (similar to what we have in SSTables)
> but it is more complicated logic which will require to store somewhere (on a
> partition or memtable level) base timestamp and store delta (as a short or
> int) in the cell if the delta is small enough (timestamp technically can be
> set by a client to any value, so we have to support still the option with a
> full long).
> * if timestamp and LocalDelInfo has the same values (taking in account micro
> to milliseconds conversion) we can use another flag bit to mark it and store
> only timestamp and calculate LocalDelInfo from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]