[
https://issues.apache.org/jira/browse/CASSANDRA-20280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Konstantinov updated CASSANDRA-20280:
--------------------------------------------
Change Category: Performance
Complexity: Normal
> More compact native memory layout for NativeCell
> ------------------------------------------------
>
> Key: CASSANDRA-20280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20280
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Memtable
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 5.x
>
> Attachments: image-2025-02-01-15-10-15-176.png,
> image-2025-02-01-15-21-22-653.png
>
>
> To capture an idea here, I going to return to it soon after finishing with
> the current in-progress tickets.
> The current NativeCell has the following native memory layout:
> !image-2025-02-01-15-21-22-653.png|width=400!
> So, when we store an integer value (4 bytes) we have total size 25 bytes.
> For an ASCII string with 10 symbols - 31 bytes.
> If we can make it more compact -> more data can be stored in memtables and
> with a potentially better cache locality.
> The idea is to use the first byte to store more flags to differentiate
> typical use cases:
> # A usual cell without TTL
> # Alive: we do not need to store localDeletionInfo
> # Value is frequently a small value and if it is not more than 128 bytes, we
> can use 1 byte to store length (varint is an alternative but it is harder to
> calculate offset for data after it)
> So, we can introduce flags in the first byte such as:
> * has path
> * has TTL
> * has delete info
> * one byte length
> When we read some component of the cell - we read flags and calculate offsets
> for the component using information from the flags
> Using this approach we can reduce overhead to the following values in the
> following typical cases:
> !image-2025-02-01-15-10-15-176.png|width=400!
> These changes are local and incapsulated in NativeCell logic. The downsize of
> the approach - extra calculations are needed to lookup a component from a
> NativeCell (read flags, if component is present then calculate an offset
> based on it and read the component by the offset)
> Additional but more complicated options:
> * use a delta encoding for timestamp (similar to what we have in SSTables)
> but it is more complicated logic which will require to store somewhere (on a
> partition or memtable level) base timestamp and store delta (as a short or
> int) in the cell if the delta is small enough (timestamp technically can be
> set by a client to strange value, so we have support still an option with a
> full long).
> * if timestamp and LocalDelInfo has the same values (taking in account micro
> to milliseconds conversion) we can use another flag bit to mark it and store
> only timestamp and calculate LocalDelInfo from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]