[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Benedict (JIRA) Fri, 12 Sep 2014 19:50:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132494#comment-14132494
 ]


Benedict commented on CASSANDRA-7282:
-------------------------------------

Yes, we want to restrict the workload to memtables only, and we want to make 
them big to generate enough numbers on insert to avoid noise. So from a fresh 
cluster, I start it with a memtable_cleanup_threshold of 0.99, 
memtable_allocation_type: offheap_objects, and I run a stress test with only 
_one column_ per partition, and make that column size 1 (although any size is 
fine if it's offheap). I'm just trying to make the memtable as large as 
possible, which with my 4Gb laptop is difficult. I then set the 
memtable_heap_space_in_mb to 1024 (feel free to make it much bigger), and then 
insert around 5M items. If you stick to one column, ensure your offheap space 
is sufficiently large for the space you insert into it, then 5M items per node 
per Gb of on-heap space is achievable (my math tells me around 9M should be 
possible, but I overshot slightly and decided to be conservative). I then 
follow up immediately with a read run hitting a random selection of those PKs.

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: reads.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

Reply via email to