[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132523#comment-14132523
 ] 

Benedict commented on CASSANDRA-7282:
-------------------------------------

The new code is not significantly more complex. In fact, its arguably less 
complex, the only difference being some of that complexity is not derived from 
an external source (CSLM is significantly more complex than the NBHOM). However 
it is a pretty simple piece of code.

The point of isolating is that we cannot possibly test all possible real world 
use cases. DSE, for instance, delivers Memory-Only SSTables, which depend on 
memtables, so all workloads hitting these would be affected fully by this 
change. All other workloads will be hit to different degrees.

In general we have to make a judgement call about how common such workloads 
are, vs how significant the effect is for those workloads. However trying to 
define a success metric based on a narrow definition of 'realistic workloads' 
that do not isolate the behaviour doesn't buy us anything in my book. Our 
hardware and workload definitions are not sufficiently universal. If we see a 
15% bump for accesses to memtables, that is IMO worth incorporating, esp. as 
the bump will increase over time. As we move memtables fully offheap we can 
support larger and larger memtables, and since the algorithmic quality of this 
change is that the performance does not increase as memtable size grows, 
whereas CSLM grows logarithmically, this change also has the likelihood of 
paying further future dividends.



> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: reads.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to