[
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889653#comment-15889653
]
Benjamin Roth commented on CASSANDRA-13241:
-------------------------------------------
I thought of 2 arrays because a semantic meaning (position vs chunk size) and a
single alignment (8, 3, 2 byte) for each could be easier to understand and to
maintain. Of course it works either way. With 2 arrays, you could still "pull
sections", it's just a single fetch more to get the 8 byte absolute offset.
Loop summing vs. "relative-absolute offset": At the end this is always a
tradeoff between mem/cpu. I personally am not the one who fights for every
single byte in this case. But I also think some CPU cycles more to sum a bunch
of ints is still bearable. I guess if I had to decide, I'd give "loop summing"
a try. Any different opinions?
Do you mean a ChunkCache cache miss? Sorry for that kind of questions. I never
came across this part of the code.
> Lower default chunk_length_in_kb from 64kb to 4kb
> -------------------------------------------------
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
> Issue Type: Wish
> Components: Core
> Reporter: Benjamin Roth
>
> Having a too low chunk size may result in some wasted disk space. A too high
> chunk size may lead to massive overreads and may have a critical impact on
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) /
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but
> if the model consists rather of small rows or small resultsets, the read
> overhead with 64kb chunk size is insanely high. This applies for example for
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic
> snitch magic": https://cl.ly/3E0t1T1z2c0J
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)