[ 
https://issues.apache.org/jira/browse/CASSANDRA-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-6737:
----------------------------------------

    Attachment: 6737.txt

Attaching patch to make batch statements only create one CF and RowMutation 
object per partition. On a relatively simple benchmark inserting a 10k rows 
batch into a single partition (using the DataStax java driver, code here: 
https://gist.github.com/pcmanus/9098347, this isn't meant to be fancy) I get up 
to more than 20x improvement with this patch (on batch insertion drop from >1.2 
seconds to ~50-100ms).

Note that there is more optimization that we can be done for single partition 
batches through some special casing, but this is a very simple start.


> A batch statements on a single partition should not create a new CF object 
> for each update
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6737
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6737
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0.6
>
>         Attachments: 6737.txt
>
>
> BatchStatement creates a new ColumnFamily object (as well as a new 
> RowMutation object) for every update in the batch, even if all those update 
> are actually on the same partition. This is particularly inefficient when 
> bulkloading data into a single partition (which is not all that uncommon).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to