[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315157#comment-14315157
 ] 

Benedict commented on CASSANDRA-7282:
-------------------------------------

I've pushed an updated branch 
[here|https://github.com/belliottsmith/cassandra/tree/7282-withcomments] that I 
hope addresses all of your concerns.

bq. LongNonBlockingHashOrderedMapTest times out on default settings for ant 
long-test on my box

The test is really designed to be run on a machine with at least 8 physical 
cores, and it may take an excessively long time on fewer. In general it's hard 
to make a test like this terminate in a predictable time horizon, and the 
longer it runs the better (ideally we'd just leave it running for days). I've 
tweaked the settings slightly to make it run better on lower core counts but 
still actually perform some useful work.

bq. this.index aliasing in predecessor appears redundant

This is necessary to save fetching a different version of the index each time, 
and incurring the memory access costs with each reference

bq. Would it be worth making maybeResize threshold configurable? What's our 
reasoning for targeting 66%?

66% seems pretty reasonable (defaults are usually somewhere between 50% and 75% 
for hash tables). We don't generally expose this kind of internal feature to 
users, but I am not totally opposed to doing so. I doubt a value much different 
from this is likely to help much, though; a value of < 50% could mean the index 
becomes more stale and see we have more scanning forwards to do on lookups, 
whereas a value of above 100% means we start seeing significant worst case 
collisions.

bq. in putIfAbsent / maybeResize / resize: Just to confirm - are we relying on 
the executor to drop duplicate resize calls on the floor? If so, it might be 
worth documenting.

It doesn't matter if it does or does not; it only ever resizes upwards. We also 
shouldn't ever have a duplicate resize call, unless we haven't executed by the 
time the table's contents have doubled.

bq. Why do we bit shift on generation of index node instead of just using 
integer value?

This is a standard practice for me: shifts in multiples of 10 are powers of 
~1000 (i.e. 1024). I find it much neater than lots of "magic constants" when 
you get above 1024 (e.g. 1048576 or 1073741824) , or snippets of  1024L * 1024L 
* 1024L. 1 << 30 is much clearer (representing 1024^3). However it's pretty 
marginal utility when working with small quantities, so I've switched it. I'd 
prefer to see it used more widely in the codebase though.

I've spent a while trying to comment very clearly the algorithm at the top of 
the class, with diagrams. Please let me know if it's a success, and if there's 
anything more I can do to clarify it.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to