[
https://issues.apache.org/jira/browse/CASSANDRA-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-6737:
----------------------------------------
Attachment: 6737.txt
Attaching patch to make batch statements only create one CF and RowMutation
object per partition. On a relatively simple benchmark inserting a 10k rows
batch into a single partition (using the DataStax java driver, code here:
https://gist.github.com/pcmanus/9098347, this isn't meant to be fancy) I get up
to more than 20x improvement with this patch (on batch insertion drop from >1.2
seconds to ~50-100ms).
Note that there is more optimization that we can be done for single partition
batches through some special casing, but this is a very simple start.
> A batch statements on a single partition should not create a new CF object
> for each update
> ------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-6737
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6737
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 2.0.6
>
> Attachments: 6737.txt
>
>
> BatchStatement creates a new ColumnFamily object (as well as a new
> RowMutation object) for every update in the batch, even if all those update
> are actually on the same partition. This is particularly inefficient when
> bulkloading data into a single partition (which is not all that uncommon).
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)