[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323080#comment-14323080
 ] 

Benedict commented on CASSANDRA-7282:
-------------------------------------

bq. This benchmark uses Longs as keys

Thanks for spotting that

bq. However, if even the author of the code can make this mistake so easily

It's worth noting that I was recovering from brain surgery performed a couple 
of weeks prior to posting this benchmark, so it's probably not _quite_ as 
indicative of the problem as it might appear. I agree it would be preferable to 
use a different method, though. The problem is doing so neatly and without 
penalty. The best option is probably to make all Token implement a 
HashComparable interface, and throw UnsupportedOperationException if they don't 
really implement it, but that is also pretty ugly. I'm pretty agnostic to the 
solution to this particular problem, so I'll defer to strong opinions.

A potentially more damning criticism of that benchmark is that it assumes a 
uniform address space for the hashes. As the number of nodes in a cluster 
grows, the entropy in the top bits decreases, and so the performance of this 
map could degrade. Besides the aforementioned improvement to bound worst case 
behaviour at O(lg(N)) we could also normalise the top order bits across the 
owned ranges. Possibly there are some other strategies, but I need to think on 
that.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to