[
https://issues.apache.org/jira/browse/CASSANDRA-20280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Konstantinov updated CASSANDRA-20280:
--------------------------------------------
Description:
To capture an idea here, I going to return to it soon after finishing with the
current in-progress tickets.
The current NativeCell has the following native memory layout:
!image-2025-02-01-15-21-22-653.png|width=400!
So, when we store an integer value (4 bytes) we have total size 25 bytes.
For an ASCII string with 10 symbols - 31 bytes.
If we can make it more compact -> more data can be stored in memtables and with
a potentially better cache locality.
The idea is to use the first byte to store more flags to differentiate typical
use cases:
# A usual cell without TTL
# Alive: we do not need to store localDeletionInfo
# Value is frequently a small value and if it is not more than 128 bytes, we
can use 1 byte to store length (varint is an alternative but it is harder to
calculate offset for data after it)
So, we can introduce flags in the first byte such as:
* has path
* has TTL
* has delete info
* one byte length
When we read some component of the cell - we read flags and calculate offsets
for the component using information from the flags
Using this approach we can reduce overhead to the following values in the
following typical cases:
!image-2025-02-01-15-10-15-176.png|width=400!
These changes are local and incapsulated in NativeCell logic. The downside of
the approach - extra calculations are needed to lookup a component from a
NativeCell (read flags, if component is present then calculate an offset based
on it and read the component by the offset)
Additional but more complicated options:
* use a delta encoding for timestamp (similar to what we have in SSTables) but
it is more complicated logic which will require to store somewhere (on a
partition or memtable level) base timestamp and store delta (as a short or int)
in the cell if the delta is small enough (timestamp technically can be set by a
client to any value, so we have support still an option with a full long).
* if timestamp and LocalDelInfo has the same values (taking in account micro
to milliseconds conversion) we can use another flag bit to mark it and store
only timestamp and calculate LocalDelInfo from it.
was:
To capture an idea here, I going to return to it soon after finishing with the
current in-progress tickets.
The current NativeCell has the following native memory layout:
!image-2025-02-01-15-21-22-653.png|width=400!
So, when we store an integer value (4 bytes) we have total size 25 bytes.
For an ASCII string with 10 symbols - 31 bytes.
If we can make it more compact -> more data can be stored in memtables and with
a potentially better cache locality.
The idea is to use the first byte to store more flags to differentiate typical
use cases:
# A usual cell without TTL
# Alive: we do not need to store localDeletionInfo
# Value is frequently a small value and if it is not more than 128 bytes, we
can use 1 byte to store length (varint is an alternative but it is harder to
calculate offset for data after it)
So, we can introduce flags in the first byte such as:
* has path
* has TTL
* has delete info
* one byte length
When we read some component of the cell - we read flags and calculate offsets
for the component using information from the flags
Using this approach we can reduce overhead to the following values in the
following typical cases:
!image-2025-02-01-15-10-15-176.png|width=400!
These changes are local and incapsulated in NativeCell logic. The downside of
the approach - extra calculations are needed to lookup a component from a
NativeCell (read flags, if component is present then calculate an offset based
on it and read the component by the offset)
Additional but more complicated options:
* use a delta encoding for timestamp (similar to what we have in SSTables) but
it is more complicated logic which will require to store somewhere (on a
partition or memtable level) base timestamp and store delta (as a short or int)
in the cell if the delta is small enough (timestamp technically can be set by a
client to strange value, so we have support still an option with a full long).
* if timestamp and LocalDelInfo has the same values (taking in account micro
to milliseconds conversion) we can use another flag bit to mark it and store
only timestamp and calculate LocalDelInfo from it.
> More compact native memory layout for NativeCell
> ------------------------------------------------
>
> Key: CASSANDRA-20280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20280
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Memtable
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 5.x
>
> Attachments: image-2025-02-01-15-10-15-176.png,
> image-2025-02-01-15-21-22-653.png
>
>
> To capture an idea here, I going to return to it soon after finishing with
> the current in-progress tickets.
> The current NativeCell has the following native memory layout:
> !image-2025-02-01-15-21-22-653.png|width=400!
> So, when we store an integer value (4 bytes) we have total size 25 bytes.
> For an ASCII string with 10 symbols - 31 bytes.
> If we can make it more compact -> more data can be stored in memtables and
> with a potentially better cache locality.
> The idea is to use the first byte to store more flags to differentiate
> typical use cases:
> # A usual cell without TTL
> # Alive: we do not need to store localDeletionInfo
> # Value is frequently a small value and if it is not more than 128 bytes, we
> can use 1 byte to store length (varint is an alternative but it is harder to
> calculate offset for data after it)
> So, we can introduce flags in the first byte such as:
> * has path
> * has TTL
> * has delete info
> * one byte length
> When we read some component of the cell - we read flags and calculate offsets
> for the component using information from the flags
> Using this approach we can reduce overhead to the following values in the
> following typical cases:
> !image-2025-02-01-15-10-15-176.png|width=400!
> These changes are local and incapsulated in NativeCell logic. The downside of
> the approach - extra calculations are needed to lookup a component from a
> NativeCell (read flags, if component is present then calculate an offset
> based on it and read the component by the offset)
> Additional but more complicated options:
> * use a delta encoding for timestamp (similar to what we have in SSTables)
> but it is more complicated logic which will require to store somewhere (on a
> partition or memtable level) base timestamp and store delta (as a short or
> int) in the cell if the delta is small enough (timestamp technically can be
> set by a client to any value, so we have support still an option with a full
> long).
> * if timestamp and LocalDelInfo has the same values (taking in account micro
> to milliseconds conversion) we can use another flag bit to mark it and store
> only timestamp and calculate LocalDelInfo from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]