Dmitry Konstantinov created CASSANDRA-20280:
-----------------------------------------------
Summary: More compact native memory layout for NativeCell
Key: CASSANDRA-20280
URL: https://issues.apache.org/jira/browse/CASSANDRA-20280
Project: Apache Cassandra
Issue Type: Improvement
Components: Local/Memtable
Reporter: Dmitry Konstantinov
Assignee: Dmitry Konstantinov
Attachments: image-2025-02-01-15-08-31-205.png,
image-2025-02-01-15-10-15-176.png
To capture an idea here, I going to return to it after finishing with the
current in-progress tickets.
The current NativeCell has the following native memory layout:
!image-2025-02-01-14-57-40-857.png|width=400!
So, when we store an integer value (4 bytes) we have total size 25 bytes.
For an ASCII string with 10 symbols - 31 bytes.
If we can make it more compact -> more data can be stored in memtables and with
a potentially better cache locality.
The idea is to use the first byte to store more flags to differentiate typical
use cases:
# A usual cell without TTL
# Alive: we do not need to store localDeletionInfo
# Value is frequently a small value and if it is not more than 128 bytes, we
can use 1 byte to store length (varint is an alternative but it is harder to
calculate offset for data after it)
So, we can introduce flags in the first byte such as:
* has path
* has TTL
* has delete info
* one byte length
When we read some component of the cell - we read flags and calculate offsets
for the component using information from the flags
For example:
!image-2025-02-01-15-10-15-176.png|width=400!
These changes are local and incapsulated in NativeCell logic.
Additional but more complicated options:
* to reduce the size is to use a delta encoding for timestamp (similar to what
we have in SSTables) but it is more complicated logic which will require to
store somewhere (on a partition or memtable level) base timestamp and store
delta (as a short or int) in the cell if the delta is small enough (timestamp
technically can be set by a client to strange value, so we have support still
an option with a full long).
* if timestamp and LocalDelInfo has the same values (taking in account micro
to milliseconds conversion) we can use another flag bit to mark it and store
only timestamp and calculate LocalDelInfo from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]