[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

Benedict (JIRA) Sat, 13 Sep 2014 14:56:55 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132953#comment-14132953
 ]


Benedict edited comment on CASSANDRA-7282 at 9/13/14 9:55 PM:
--------------------------------------------------------------

bq. Since this is the case, should we restrict this method to Murmur tokens 
only?

Well, if I can spend the time ensuring safety we should be able to use this for 
RandomPartitioner also.

bq. I would still change the condition in the preface to use non-strict 
inequality, though, because cropping tokens to 32 bits will introduce 
collisions.

-There will be collisions with or without truncation. The fact that there are 
collisions doesn't affect the constraint I've imposed upon the data; we assume 
nothing about the dataset when two hashCode()s are equal, and simply resort to 
the underlying token comparator, so I think the constraint is sufficient. 
Non-strict equality is surely more broken, however? It would hold true for 
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional 
implication, i.e. k1 <= k2 <==> k1.hashCode() <= k2.hashCode().-

I just reread your comment and realise you meant the RHS only. Agreed this 
should be changed. However we should probably opt for the stronger 
bidirectional constraint, since it is still more correct.

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
"ordering key" or "ordering prefix"?

Perhaps. This was the simplest approach, and since it *is* a hash key used to 
index a hash table it seems suitable to use hashCode(), and impose the extra 
constraint contextually. I'm fairly neutral though; we certainly could 
introduce a new interface. I'll see how it looks.


was (Author: benedict):
bq. Since this is the case, should we restrict this method to Murmur tokens 
only?

Well, if I can spend the time ensuring safety we should be able to use this for 
RandomPartitioner also.

bq. I would still change the condition in the preface to use non-strict 
inequality, though, because cropping tokens to 32 bits will introduce 
collisions.

-There will be collisions with or without truncation. The fact that there are 
collisions doesn't affect the constraint I've imposed upon the data; we assume 
nothing about the dataset when two hashCode()s are equal, and simply resort to 
the underlying token comparator, so I think the constraint is sufficient. 
Non-strict equality is surely more broken, however? It would hold true for 
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional 
implication, i.e. k1 <= k2 <==> k1.hashCode() <= k2.hashCode().-

I just reread your comment and realise you meant the RHS only. Agreed this 
should be changed.

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
"ordering key" or "ordering prefix"?

Perhaps. This was the simplest approach, and since it *is* a hash key used to 
index a hash table it seems suitable to use hashCode(), and impose the extra 
constraint contextually. I'm fairly neutral though; we certainly could 
introduce a new interface. I'll see how it looks.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: profile.yaml, reads.svg, run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

Reply via email to