[
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315157#comment-14315157
]
Benedict commented on CASSANDRA-7282:
-------------------------------------
I've pushed an updated branch
[here|https://github.com/belliottsmith/cassandra/tree/7282-withcomments] that I
hope addresses all of your concerns.
bq. LongNonBlockingHashOrderedMapTest times out on default settings for ant
long-test on my box
The test is really designed to be run on a machine with at least 8 physical
cores, and it may take an excessively long time on fewer. In general it's hard
to make a test like this terminate in a predictable time horizon, and the
longer it runs the better (ideally we'd just leave it running for days). I've
tweaked the settings slightly to make it run better on lower core counts but
still actually perform some useful work.
bq. this.index aliasing in predecessor appears redundant
This is necessary to save fetching a different version of the index each time,
and incurring the memory access costs with each reference
bq. Would it be worth making maybeResize threshold configurable? What's our
reasoning for targeting 66%?
66% seems pretty reasonable (defaults are usually somewhere between 50% and 75%
for hash tables). We don't generally expose this kind of internal feature to
users, but I am not totally opposed to doing so. I doubt a value much different
from this is likely to help much, though; a value of < 50% could mean the index
becomes more stale and see we have more scanning forwards to do on lookups,
whereas a value of above 100% means we start seeing significant worst case
collisions.
bq. in putIfAbsent / maybeResize / resize: Just to confirm - are we relying on
the executor to drop duplicate resize calls on the floor? If so, it might be
worth documenting.
It doesn't matter if it does or does not; it only ever resizes upwards. We also
shouldn't ever have a duplicate resize call, unless we haven't executed by the
time the table's contents have doubled.
bq. Why do we bit shift on generation of index node instead of just using
integer value?
This is a standard practice for me: shifts in multiples of 10 are powers of
~1000 (i.e. 1024). I find it much neater than lots of "magic constants" when
you get above 1024 (e.g. 1048576 or 1073741824) , or snippets of 1024L * 1024L
* 1024L. 1 << 30 is much clearer (representing 1024^3). However it's pretty
marginal utility when working with small quantities, so I've switched it. I'd
prefer to see it used more widely in the codebase though.
I've spent a while trying to comment very clearly the algorithm at the top of
the class, with diagrams. Please let me know if it's a success, and if there's
anything more I can do to clarify it.
> Faster Memtable map
> -------------------
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Labels: performance
> Fix For: 3.0
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg,
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast
> majority of users use a hash partitioner, it occurs to me we could maintain a
> hybrid ordered list / hash map. The list would impose the normal order on the
> collection, but a hash index would live alongside as part of the same data
> structure, simply mapping into the list and permitting O(1) lookups and
> inserts.
> I've chosen to implement this initial version as a linked-list node per item,
> but we can optimise this in future by storing fatter nodes that permit a
> cache-line's worth of hashes to be checked at once, further reducing the
> constant factor costs for lookups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)