[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888911#comment-15888911
 ] 

Benjamin Roth commented on CASSANDRA-13241:
-------------------------------------------

How about this:

You create 2 chunk lookup tables. One with absolute pointers (long, 8 byte).
A second one with relative pointers or chunk-sizes - 2 bytes are enough for up 
to 64kb chunks.
You store an absolute pointer for every $x chunks (1000 in this example).
So you can get the absolute offset looking up the absolute position with $idx = 
($pos - ($pos % 100)) / $x
Then you iterate through the size lookup from ($pos - ($pos % 100)) to $pos - 1.
A fallback can be provided for chunks >64kb. Either relative pointers are 
completely avoided or are increased to 3 bytes.

There you go.

Payload of 1 TB = 1024 * 1024 * 1024kb

CS 64 (NOW):
============
chunks = 1024 * 1024 * 1024kb / 64kb = 16777216 (16M)
compression = 1.99
compressed_size = 1024 * 1024 * 1024kb / 1.99 = 539568756kb
kernel_pages = 134892189
absolute_pointer_size = 8 * chunks = 134217728 (128MB)
kernel_page_size = 134892189 * 8 (1029 MB)
total_size = 1157MB

CS 4 with relative positions
============================
chunks = 1024 * 1024 * 1024kb / 4kb = 268435456 (256M)
compression = 1.75
compressed_size = 1024 * 1024 * 1024kb / 1.75 = 613566757kb
kernel_pages = 153391689
absolute_pointer_size = 8 * chunks / 1000 = 2147484 (2 MB)
relative_pointer_size = 2 * chunks = 536870912 (512 MB)
kernel_page_size = 153391689 * 8 = 1227133512 (1170MB)
total_size = 1684MB

increase = 45%

=> Reduces memory overhead when reducing chunk size from 64kb to 4kb from the 
initially mentioned 800% to 45%
when you also take kernel structs into account which are also of a relevant 
size - even more than the initially discussed "128M" for 64kb chunks

Pro:
A lot less memory required

Con:
Some CPU overhead. But is this really relevant compared to decompressing 4kb or 
even 64kb?

P.S.: Kernel memory calculation is based on the 8 bytes [~aweisberg] has 
researched. Compression ratios are taken from the percona blog.

> Lower default chunk_length_in_kb from 64kb to 4kb
> -------------------------------------------------
>
>                 Key: CASSANDRA-13241
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core
>            Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to