[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

Ben Bromhead (JIRA) Mon, 27 Feb 2017 11:27:05 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886361#comment-15886361
 ]


Ben Bromhead commented on CASSANDRA-13241:
------------------------------------------

We generally end up recommending to our customers they reduce their default 
chunk_length_in_kb for most applications generally to be around the average 
size of their reads (dependent on end latency goals) with a floor of the 
underlying disks smallest read unit (generally for SSDs this is the page size, 
rather than block size iirc). This ends up being anywhere from 2kb - 16kb 
depending on hardware. I would say driving higher IOPs/lower latencies through 
the disk rather than throughput is much more aligned with the standard use 
cases for Cassandra.

4kb is pretty common and I would be very happy with it as the default chunk 
length, especially given that SSDs are a pretty much standard recommendation 
for C*. Increasing the chunk length for better compression whilst sacrificing 
read perf should be opt-in rather than default.

+1


> Lower default chunk_length_in_kb from 64kb to 4kb
> -------------------------------------------------
>
>                 Key: CASSANDRA-13241
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core
>            Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 4kb

Reply via email to