[ 
https://issues.apache.org/jira/browse/CASSANDRA-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974698#action_12974698
 ] 

Cliff Moon commented on CASSANDRA-1891:
---------------------------------------

You are correct that CSLM does not override putAll, so there is no benefit to 
be had with that approach.  That's why this patch uses the CSLM constructor 
which takes a SortedMap.  Internally this constructor invokes buildFromSorted 
which iterates over the sorted map and builds the internal structures of the 
CSLM without iterating.  In my use case I have rather large supercolumns that 
contain on the order of 100,000 subcolumns.  With the patch I find performance 
benefits ranging from 10 ~ 15% throughput increase when deserializing the 
supercolumns from disk.

Also it's unclear to me what benefit is to be had from using a TreeMap.  It's 
more efficient to just deserialize directly into the CSLM, which is what 
ColumnSortedMap enables.  It isn't a full implementation of SortedMap, just 
enough to enable the correct behavior in CSLM.

As for the test failures, I've never been able to get cassandra's unit tests to 
work locally, so I always had assumed they were simply ornamental.

> large supercolumn deserialization invokes CSLM worst case scenario
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-1891
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1891
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Cliff Moon
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: supercolumn.patch
>
>
> SuperColumn deserialization hits a worst case insert scenario for CSLM: 
> inserting pre-sorted entries one at a time.  Inside of CSLM this requires 
> scanning to the end of the list and doing a comparison at every step for 
> every item inserted.  This patch supplies a SortedMap interface to the 
> supercolumn deserialization.  CSLM will do a bulk insert from a SortedMap 
> interface supplied in the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to