[ 
https://issues.apache.org/jira/browse/CASSANDRA-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134612#comment-14134612
 ] 

Benedict commented on CASSANDRA-7928:
-------------------------------------

Regrettably this is very plausible, and adds credence to CASSANDRA-7130, which 
we should consider reopening. This ticket is also a sensible idea to help 
mitigate the issue. 

I knocked up a quick benchmark, results show lz4 being consistently at least 
twice as fast. It's actually quite easy to explain: if the data is compressed, 
there is actually less data to operate over; if it is not easily compressed 
(say, it is highly random), it degrades itself to a simple copy to avoid 
wasting work (as demonstrated in the benchmark - it's 5 times faster over 
completely random data than partially random data).

{noformat}
Benchmark              (duplicateLookback)  (pageSize)  (randomRatio)  
(randomRunLength)  (uniquePages)   Mode  Samples    Score  Score error   Units
Compression.adler32                 4..128       65536              0           
   4..16           8192  thrpt        5   16.476        1.954  ops/ms
Compression.adler32                 4..128       65536              0           
128..512           8192  thrpt        5   16.720        0.230  ops/ms
Compression.adler32                 4..128       65536            0.1           
   4..16           8192  thrpt        5   16.269        2.118  ops/ms
Compression.adler32                 4..128       65536            0.1           
128..512           8192  thrpt        5   16.665        0.246  ops/ms
Compression.adler32                 4..128       65536            1.0           
   4..16           8192  thrpt        5   16.653        0.147  ops/ms
Compression.adler32                 4..128       65536            1.0           
128..512           8192  thrpt        5   16.686        0.214  ops/ms
Compression.lz4                     4..128       65536              0           
   4..16           8192  thrpt        5   28.275        0.265  ops/ms
Compression.lz4                     4..128       65536              0           
128..512           8192  thrpt        5  232.602       48.279  ops/ms
Compression.lz4                     4..128       65536            0.1           
   4..16           8192  thrpt        5   34.081        0.337  ops/ms
Compression.lz4                     4..128       65536            0.1           
128..512           8192  thrpt        5  130.857       18.157  ops/ms
Compression.lz4                     4..128       65536            1.0           
   4..16           8192  thrpt        5  187.992        9.190  ops/ms
Compression.lz4                     4..128       65536            1.0           
128..512           8192  thrpt        5  186.054        2.267  ops/ms
{noformat}


> Digest queries do not require alder32 checks
> --------------------------------------------
>
>                 Key: CASSANDRA-7928
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7928
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Priority: Minor
>
>  While reading data from sstables, C* does Alder32 checks for any data being 
> read. We have seen that this causes higher CPU usage while doing kernel 
> profiling. These checks might not be useful for digest queries as they will 
> have a different digest in case of corruption. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to