[jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)

Aleksey Yeschenko (JIRA) Fri, 01 Aug 2014 06:30:20 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082234#comment-14082234
 ]


Aleksey Yeschenko commented on CASSANDRA-5959:
----------------------------------------------

[~snazy] they will be atomic - because C* will merge them all into a single 
Mutation before applying it (so long as they have the same partition key). And 
you can assign different timestamps to different statements to avoid 'the 
issue', and it still will be atomic.

> CQL3 support for multi-column insert in a single operation (Batch Insert / 
> Batch Mutate)
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5959
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5959
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Drivers (now out of tree)
>            Reporter: Les Hazlewood
>              Labels: CQL
>
> h3. Impetus for this Request
> (from the original [question on 
> StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):
> I want to insert a single row with 50,000 columns into Cassandra 1.2.9. 
> Before inserting, I have all the data for the entire row ready to go (in 
> memory):
> {code}
> +---------+------+------+------+------+-------+
> |         | 0    | 1    | 2    | ...  | 49999 |
> | row_id  +------+------+------+------+-------+
> |         | text | text | text | ...  | text  |
> +---------+------+------+------|------+-------+
> {code}
> The column names are integers, allowing slicing for pagination. The column 
> values are a value at that particular index.
> CQL3 table definition:
> {code}
> create table results (
>     row_id text,
>     index int,
>     value text,
>     primary key (row_id, index)
> ) 
> with compact storage;
> {code}
> As I already have the row_id and all 50,000 name/value pairs in memory, I 
> just want to insert a single row into Cassandra in a single request/operation 
> so it is as fast as possible.
> The only thing I can seem to find is to do execute the following 50,000 times:
> {code}
> INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
> {code}
> where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is 
> the text value to store at location {{i}}.
> With the Datastax Java Driver client and C* server on the same development 
> machine, this took a full minute to execute.
> Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver 
> Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)]
>  on the same machine took 7.5 minutes.  I thought batches were supposed to be 
> _faster_ than individual inserts?
> We tried instead with a Thrift client (Astyanax) and the same insert via a 
> [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html].
>   This took _235 milliseconds_.
> h3. Feature Request
> As a result of this performance testing, this issue is to request that CQL3 
> support batch mutation operations as a single operation (statement) to ensure 
> the same speed/performance benefits as existing Thrift clients.
> Example suggested syntax (based on the above example table/column family):
> {code}
> insert into results (row_id, (index,value)) values 
>     ((0,text0), (1,text1), (2,text2), ..., (N,textN));
> {code}
> Each value in the {{values}} clause is a tuple.  The first tuple element is 
> the column name, the second tuple element is the column value.  This seems to 
> be the most simple/accurate representation of what happens during a batch 
> insert/mutate.
> Not having this CQL feature forced us to remove the Datastax Java Driver 
> (which we liked) in favor of Astyanax because Astyanax supports this 
> behavior.  We desire feature/performance parity between Thrift and 
> CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the 
> Driver.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)

Reply via email to