[
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886361#comment-15886361
]
Ben Bromhead commented on CASSANDRA-13241:
------------------------------------------
We generally end up recommending to our customers they reduce their default
chunk_length_in_kb for most applications generally to be around the average
size of their reads (dependent on end latency goals) with a floor of the
underlying disks smallest read unit (generally for SSDs this is the page size,
rather than block size iirc). This ends up being anywhere from 2kb - 16kb
depending on hardware. I would say driving higher IOPs/lower latencies through
the disk rather than throughput is much more aligned with the standard use
cases for Cassandra.
4kb is pretty common and I would be very happy with it as the default chunk
length, especially given that SSDs are a pretty much standard recommendation
for C*. Increasing the chunk length for better compression whilst sacrificing
read perf should be opt-in rather than default.
+1
> Lower default chunk_length_in_kb from 64kb to 4kb
> -------------------------------------------------
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
> Issue Type: Wish
> Components: Core
> Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high
> chunk size may lead to massive overreads and may have a critical impact on
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) /
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but
> if the model consists rather of small rows or small resultsets, the read
> overhead with 64kb chunk size is insanely high. This applies for example for
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic
> snitch magic": https://cl.ly/3E0t1T1z2c0J
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)