[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Joshua McKenzie (JIRA) Mon, 16 Mar 2015 11:56:10 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363708#comment-14363708
 ]


Joshua McKenzie commented on CASSANDRA-7282:
--------------------------------------------

>From the comments:
bq. We also use a somewhat different explanation here for behaviour, which is 
perhaps easier to comprehend.
+1 for massive understatement. :) The binary reversal "magic" for determining 
bucket to hash into is very well explained in the comments now. That's the 
subtlest part to wrap my head around out of all this and your explanation is 
spot on.

bq. I'd prefer to see it used more widely in the codebase though.
I don't disagree, but when in rome... that, and a piece of code as dense as 
NBHOM can use all the familiarity we can get in here.

So my last few minor outstanding concerns:
# hashCode conformance
bq. The best option is probably to make all Token implement a HashComparable 
interface, and throw UnsupportedOperationException if they don't really 
implement it, but that is also pretty ugly.
Not *too* ugly in my opinion - a little boilerplate, sure, but not ugly. The 
distinction of whether a hashCode implementation conforms to the requirements 
of NBHOM is subtle enough that I believe it warrants codifying.
# "i" value / i(...) is a strange combo. For instance, the following:
{noformat}
int i = i(hash, index);
{noformat}
took me by surprise. It's relatively clear what you're doing when you read the 
surrounding code  - just could use different naming imo.
# maybeResize and the load factor of 66%. Given that it's a lazy, async "move 
the buckets, not the items" approach I can see how it's not going to 
make-or-break at 50% vs. 75%. I'd like *some* kind of data on this though; I'd 
be content with back-of-the-napkin theorization here - just something more than 
"seems pretty reasonable". :)

On the whole - looking very good.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Reply via email to