[
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132953#comment-14132953
]
Benedict edited comment on CASSANDRA-7282 at 9/13/14 9:55 PM:
--------------------------------------------------------------
bq. Since this is the case, should we restrict this method to Murmur tokens
only?
Well, if I can spend the time ensuring safety we should be able to use this for
RandomPartitioner also.
bq. I would still change the condition in the preface to use non-strict
inequality, though, because cropping tokens to 32 bits will introduce
collisions.
-There will be collisions with or without truncation. The fact that there are
collisions doesn't affect the constraint I've imposed upon the data; we assume
nothing about the dataset when two hashCode()s are equal, and simply resort to
the underlying token comparator, so I think the constraint is sufficient.
Non-strict equality is surely more broken, however? It would hold true for
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional
implication, i.e. k1 <= k2 <==> k1.hashCode() <= k2.hashCode().-
I just reread your comment and realise you meant the RHS only. Agreed this
should be changed. However we should probably opt for the stronger
bidirectional constraint, since it is still more correct.
bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an
"ordering key" or "ordering prefix"?
Perhaps. This was the simplest approach, and since it *is* a hash key used to
index a hash table it seems suitable to use hashCode(), and impose the extra
constraint contextually. I'm fairly neutral though; we certainly could
introduce a new interface. I'll see how it looks.
was (Author: benedict):
bq. Since this is the case, should we restrict this method to Murmur tokens
only?
Well, if I can spend the time ensuring safety we should be able to use this for
RandomPartitioner also.
bq. I would still change the condition in the preface to use non-strict
inequality, though, because cropping tokens to 32 bits will introduce
collisions.
-There will be collisions with or without truncation. The fact that there are
collisions doesn't affect the constraint I've imposed upon the data; we assume
nothing about the dataset when two hashCode()s are equal, and simply resort to
the underlying token comparator, so I think the constraint is sufficient.
Non-strict equality is surely more broken, however? It would hold true for
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional
implication, i.e. k1 <= k2 <==> k1.hashCode() <= k2.hashCode().-
I just reread your comment and realise you meant the RHS only. Agreed this
should be changed.
bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an
"ordering key" or "ordering prefix"?
Perhaps. This was the simplest approach, and since it *is* a hash key used to
index a hash table it seems suitable to use hashCode(), and impose the extra
constraint contextually. I'm fairly neutral though; we certainly could
introduce a new interface. I'll see how it looks.
> Faster Memtable map
> -------------------
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Labels: performance
> Fix For: 3.0
>
> Attachments: profile.yaml, reads.svg, run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast
> majority of users use a hash partitioner, it occurs to me we could maintain a
> hybrid ordered list / hash map. The list would impose the normal order on the
> collection, but a hash index would live alongside as part of the same data
> structure, simply mapping into the list and permitting O(1) lookups and
> inserts.
> I've chosen to implement this initial version as a linked-list node per item,
> but we can optimise this in future by storing fatter nodes that permit a
> cache-line's worth of hashes to be checked at once, further reducing the
> constant factor costs for lookups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)