[jira] [Updated] (CASSANDRA-20280) More compact native memory layout for NativeCell

Dmitry Konstantinov (Jira) Sat, 01 Feb 2025 07:45:22 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-20280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dmitry Konstantinov updated CASSANDRA-20280:
--------------------------------------------
    Description: 
To capture an idea here, I going to return to it soon after finishing with the 
current in-progress tickets.

The current NativeCell has the following native memory layout:

!image-2025-02-01-15-21-22-653.png|width=400!

So, when we store an integer value (4 bytes) we have total size 25 bytes.

For an ASCII string with 10 symbols - 31 bytes.

If we can make it more compact -> more data can be stored in memtables and with 
a potentially better CPU cache usage on lookup.

The idea is to use the first byte to store more flags to differentiate typical 
use cases:
 # A usual cell without TTL
 # Alive: we do not need to store localDeletionInfo
 # Value is frequently a small value and if it is not more than 128 bytes, we 
can use 1 byte to store length (varint is an alternative but it is harder to 
calculate offset for data after it)

So, we can introduce flags in the first byte such as:
 * has path (the existing one)
 * has TTL
 * has delete info
 * one byte length

When we read some component of the cell - we read flags and calculate offsets 
for the component using information from the flags

Using this approach we can reduce native memory overhead to the following 
values in the following typical cases: 

!image-2025-02-01-15-10-15-176.png|width=400!

These changes are local and incapsulated in NativeCell logic. The downside of 
the approach - extra calculations are needed to lookup a component from a 
NativeCell  (read flags, if component is present then calculate an offset based 
on it and read the component by the offset)

Additional but more complicated options:
 * use a delta encoding for timestamp (similar to what we have in SSTables) but 
it is more complicated logic which will require to store somewhere (on a 
partition or memtable level) base timestamp and store delta (as a short or int) 
in the cell if the delta is small enough (timestamp technically can be set by a 
client to any value, so we have to support still the option with a full long).
 * if timestamp and LocalDelInfo has the same values (taking in account micro 
to milliseconds conversion) we can use another flag bit to mark it and store 
only timestamp and calculate LocalDelInfo from it.

  was:
To capture an idea here, I going to return to it soon after finishing with the 
current in-progress tickets.

The current NativeCell has the following native memory layout:

!image-2025-02-01-15-21-22-653.png|width=400!

So, when we store an integer value (4 bytes) we have total size 25 bytes.

For an ASCII string with 10 symbols - 31 bytes.

If we can make it more compact -> more data can be stored in memtables and with 
a potentially better cache locality.

The idea is to use the first byte to store more flags to differentiate typical 
use cases:
 # A usual cell without TTL
 # Alive: we do not need to store localDeletionInfo
 # Value is frequently a small value and if it is not more than 128 bytes, we 
can use 1 byte to store length (varint is an alternative but it is harder to 
calculate offset for data after it)

So, we can introduce flags in the first byte such as:
 * has path (the existing one)
 * has TTL
 * has delete info
 * one byte length

When we read some component of the cell - we read flags and calculate offsets 
for the component using information from the flags

Using this approach we can reduce native memory overhead to the following 
values in the following typical cases: 

!image-2025-02-01-15-10-15-176.png|width=400!

These changes are local and incapsulated in NativeCell logic. The downside of 
the approach - extra calculations are needed to lookup a component from a 
NativeCell  (read flags, if component is present then calculate an offset based 
on it and read the component by the offset)

Additional but more complicated options:
 * use a delta encoding for timestamp (similar to what we have in SSTables) but 
it is more complicated logic which will require to store somewhere (on a 
partition or memtable level) base timestamp and store delta (as a short or int) 
in the cell if the delta is small enough (timestamp technically can be set by a 
client to any value, so we have to support still the option with a full long).
 * if timestamp and LocalDelInfo has the same values (taking in account micro 
to milliseconds conversion) we can use another flag bit to mark it and store 
only timestamp and calculate LocalDelInfo from it.


> More compact native memory layout for NativeCell
> ------------------------------------------------
>
>                 Key: CASSANDRA-20280
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20280
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Memtable
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: image-2025-02-01-15-10-15-176.png, 
> image-2025-02-01-15-21-22-653.png
>
>
> To capture an idea here, I going to return to it soon after finishing with 
> the current in-progress tickets.
> The current NativeCell has the following native memory layout:
> !image-2025-02-01-15-21-22-653.png|width=400!
> So, when we store an integer value (4 bytes) we have total size 25 bytes.
> For an ASCII string with 10 symbols - 31 bytes.
> If we can make it more compact -> more data can be stored in memtables and 
> with a potentially better CPU cache usage on lookup.
> The idea is to use the first byte to store more flags to differentiate 
> typical use cases:
>  # A usual cell without TTL
>  # Alive: we do not need to store localDeletionInfo
>  # Value is frequently a small value and if it is not more than 128 bytes, we 
> can use 1 byte to store length (varint is an alternative but it is harder to 
> calculate offset for data after it)
> So, we can introduce flags in the first byte such as:
>  * has path (the existing one)
>  * has TTL
>  * has delete info
>  * one byte length
> When we read some component of the cell - we read flags and calculate offsets 
> for the component using information from the flags
> Using this approach we can reduce native memory overhead to the following 
> values in the following typical cases: 
> !image-2025-02-01-15-10-15-176.png|width=400!
> These changes are local and incapsulated in NativeCell logic. The downside of 
> the approach - extra calculations are needed to lookup a component from a 
> NativeCell  (read flags, if component is present then calculate an offset 
> based on it and read the component by the offset)
> Additional but more complicated options:
>  * use a delta encoding for timestamp (similar to what we have in SSTables) 
> but it is more complicated logic which will require to store somewhere (on a 
> partition or memtable level) base timestamp and store delta (as a short or 
> int) in the cell if the delta is small enough (timestamp technically can be 
> set by a client to any value, so we have to support still the option with a 
> full long).
>  * if timestamp and LocalDelInfo has the same values (taking in account micro 
> to milliseconds conversion) we can use another flag bit to mark it and store 
> only timestamp and calculate LocalDelInfo from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-20280) More compact native memory layout for NativeCell

Reply via email to