[
https://issues.apache.org/jira/browse/CASSANDRA-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134612#comment-14134612
]
Benedict commented on CASSANDRA-7928:
-------------------------------------
Regrettably this is very plausible, and adds credence to CASSANDRA-7130, which
we should consider reopening. This ticket is also a sensible idea to help
mitigate the issue.
I knocked up a quick benchmark, results show lz4 being consistently at least
twice as fast. It's actually quite easy to explain: if the data is compressed,
there is actually less data to operate over; if it is not easily compressed
(say, it is highly random), it degrades itself to a simple copy to avoid
wasting work (as demonstrated in the benchmark - it's 5 times faster over
completely random data than partially random data).
{noformat}
Benchmark (duplicateLookback) (pageSize) (randomRatio)
(randomRunLength) (uniquePages) Mode Samples Score Score error Units
Compression.adler32 4..128 65536 0
4..16 8192 thrpt 5 16.476 1.954 ops/ms
Compression.adler32 4..128 65536 0
128..512 8192 thrpt 5 16.720 0.230 ops/ms
Compression.adler32 4..128 65536 0.1
4..16 8192 thrpt 5 16.269 2.118 ops/ms
Compression.adler32 4..128 65536 0.1
128..512 8192 thrpt 5 16.665 0.246 ops/ms
Compression.adler32 4..128 65536 1.0
4..16 8192 thrpt 5 16.653 0.147 ops/ms
Compression.adler32 4..128 65536 1.0
128..512 8192 thrpt 5 16.686 0.214 ops/ms
Compression.lz4 4..128 65536 0
4..16 8192 thrpt 5 28.275 0.265 ops/ms
Compression.lz4 4..128 65536 0
128..512 8192 thrpt 5 232.602 48.279 ops/ms
Compression.lz4 4..128 65536 0.1
4..16 8192 thrpt 5 34.081 0.337 ops/ms
Compression.lz4 4..128 65536 0.1
128..512 8192 thrpt 5 130.857 18.157 ops/ms
Compression.lz4 4..128 65536 1.0
4..16 8192 thrpt 5 187.992 9.190 ops/ms
Compression.lz4 4..128 65536 1.0
128..512 8192 thrpt 5 186.054 2.267 ops/ms
{noformat}
> Digest queries do not require alder32 checks
> --------------------------------------------
>
> Key: CASSANDRA-7928
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7928
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Priority: Minor
>
> While reading data from sstables, C* does Alder32 checks for any data being
> read. We have seen that this causes higher CPU usage while doing kernel
> profiling. These checks might not be useful for digest queries as they will
> have a different digest in case of corruption.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)