[jira] [Commented] (CASSANDRA-4710) High key hashing overhead for index scans when using RandomPartitioner

Daniel Norberg (JIRA) Mon, 24 Sep 2012 19:16:11 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462360#comment-13462360
 ]


Daniel Norberg commented on CASSANDRA-4710:
-------------------------------------------

The check against DatabaseDescriptor.getIndexInterval is to be able to exit the 
loop in case the key looked for is not present in the index. 

When doing token comparison the loop can be exited when an index entry whose 
token is greater than the needle is encountered as the index is sorted on 
token. I.e. the if (v < 0) return null. But when doing raw key comparison we 
have to look through every entry in the section of the index that the sampled 
index gave us to be able to know that a key was not present. Fortunately this 
should be rare as key presence is checked using the bloom filter for EQ lookups 
before reading the index.
                
> High key hashing overhead for index scans when using RandomPartitioner
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4710
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4710
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Daniel Norberg
>         Attachments: 
> 0001-SSTableReader-compare-raw-key-when-scanning-index.patch
>
>
> For a workload where the dataset is completely in ram, the md5 hashing of the 
> keys during index scans becomes a bottleneck for reads when using 
> RandomPartitioner, according to profiling.
> Instead performing a raw key equals check in SSTableReader.getPosition() for 
> EQ operations improves throughput by some 30% for my workload (moving the 
> bottleneck elsewhere).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4710) High key hashing overhead for index scans when using RandomPartitioner

Reply via email to