[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132523#comment-14132523 ]
Benedict commented on CASSANDRA-7282: ------------------------------------- The new code is not significantly more complex. In fact, its arguably less complex, the only difference being some of that complexity is not derived from an external source (CSLM is significantly more complex than the NBHOM). However it is a pretty simple piece of code. The point of isolating is that we cannot possibly test all possible real world use cases. DSE, for instance, delivers Memory-Only SSTables, which depend on memtables, so all workloads hitting these would be affected fully by this change. All other workloads will be hit to different degrees. In general we have to make a judgement call about how common such workloads are, vs how significant the effect is for those workloads. However trying to define a success metric based on a narrow definition of 'realistic workloads' that do not isolate the behaviour doesn't buy us anything in my book. Our hardware and workload definitions are not sufficiently universal. If we see a 15% bump for accesses to memtables, that is IMO worth incorporating, esp. as the bump will increase over time. As we move memtables fully offheap we can support larger and larger memtables, and since the algorithmic quality of this change is that the performance does not increase as memtable size grows, whereas CSLM grows logarithmically, this change also has the likelihood of paying further future dividends. > Faster Memtable map > ------------------- > > Key: CASSANDRA-7282 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Benedict > Labels: performance > Fix For: 3.0 > > Attachments: reads.svg, writes.svg > > > Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in > our memtables. Maintaining this is an O(lg(n)) operation; since the vast > majority of users use a hash partitioner, it occurs to me we could maintain a > hybrid ordered list / hash map. The list would impose the normal order on the > collection, but a hash index would live alongside as part of the same data > structure, simply mapping into the list and permitting O(1) lookups and > inserts. > I've chosen to implement this initial version as a linked-list node per item, > but we can optimise this in future by storing fatter nodes that permit a > cache-line's worth of hashes to be checked at once, further reducing the > constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)