Dmitry Konstantinov created CASSANDRA-20280:
-----------------------------------------------

             Summary: More compact native memory layout for NativeCell
                 Key: CASSANDRA-20280
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20280
             Project: Apache Cassandra
          Issue Type: Improvement
          Components: Local/Memtable
            Reporter: Dmitry Konstantinov
            Assignee: Dmitry Konstantinov
         Attachments: image-2025-02-01-15-08-31-205.png, 
image-2025-02-01-15-10-15-176.png

To capture an idea here, I going to return to it after finishing with the 
current in-progress tickets.

The current NativeCell has the following native memory layout:

!image-2025-02-01-14-57-40-857.png|width=400!

So, when we store an integer value (4 bytes) we have total size 25 bytes.

For an ASCII string with 10 symbols - 31 bytes.

If we can make it more compact -> more data can be stored in memtables and with 
a potentially better cache locality.

The idea is to use the first byte to store more flags to differentiate typical 
use cases:
 # A usual cell without TTL
 # Alive: we do not need to store localDeletionInfo
 # Value is frequently a small value and if it is not more than 128 bytes, we 
can use 1 byte to store length (varint is an alternative but it is harder to 
calculate offset for data after it)

So, we can introduce flags in the first byte such as:
 * has path
 * has TTL
 * has delete info
 * one byte length

When we read some component of the cell - we read flags and calculate offsets 
for the component using information from the flags

For example: 

!image-2025-02-01-15-10-15-176.png|width=400!

These changes are local and incapsulated in NativeCell logic.

Additional but more complicated options:
 * to reduce the size is to use a delta encoding for timestamp (similar to what 
we have in SSTables) but it is more complicated logic which will require to 
store somewhere (on a partition or memtable level) base timestamp and store 
delta (as a short or int) in the cell if the delta is small enough (timestamp 
technically can be set by a client to strange value, so we have support still 
an option with a full long).
 * if timestamp and LocalDelInfo has the same values (taking in account micro 
to milliseconds conversion) we can use another flag bit to mark it and store 
only timestamp and calculate LocalDelInfo from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to