[ https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082025#comment-14082025 ]
Robert Stupp commented on CASSANDRA-5959: ----------------------------------------- Can we resolve this ticket as "later" instead of "duplicate"? This one covers CQL INSERT syntax, which is different to CASSANDRA-4693 which covers batch prepared statements. In CASSANDRA-7654 I restricted rows to be in the same partition to keep updates as atomic as possible to prevent it being just another syntax for BATCH w/ pstmt. If the rows are restricted to be in the same partition, it could solve another "issue" that deletes always win over inserts/updates (with the same modification timestamp). It could be used to "replace" a whole partition although I'm not sold on the fact that an INSERT implicitly performs a DELETE. I think with Thift it was possible to replace a complete row (not sure - did not work much with Thift). > CQL3 support for multi-column insert in a single operation (Batch Insert / > Batch Mutate) > ---------------------------------------------------------------------------------------- > > Key: CASSANDRA-5959 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5959 > Project: Cassandra > Issue Type: New Feature > Components: Core, Drivers (now out of tree) > Reporter: Les Hazlewood > Labels: CQL > > h3. Impetus for this Request > (from the original [question on > StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]): > I want to insert a single row with 50,000 columns into Cassandra 1.2.9. > Before inserting, I have all the data for the entire row ready to go (in > memory): > {code} > +---------+------+------+------+------+-------+ > | | 0 | 1 | 2 | ... | 49999 | > | row_id +------+------+------+------+-------+ > | | text | text | text | ... | text | > +---------+------+------+------|------+-------+ > {code} > The column names are integers, allowing slicing for pagination. The column > values are a value at that particular index. > CQL3 table definition: > {code} > create table results ( > row_id text, > index int, > value text, > primary key (row_id, index) > ) > with compact storage; > {code} > As I already have the row_id and all 50,000 name/value pairs in memory, I > just want to insert a single row into Cassandra in a single request/operation > so it is as fast as possible. > The only thing I can seem to find is to do execute the following 50,000 times: > {code} > INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?); > {code} > where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is > the text value to store at location {{i}}. > With the Datastax Java Driver client and C* server on the same development > machine, this took a full minute to execute. > Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver > Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] > on the same machine took 7.5 minutes. I thought batches were supposed to be > _faster_ than individual inserts? > We tried instead with a Thrift client (Astyanax) and the same insert via a > [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. > This took _235 milliseconds_. > h3. Feature Request > As a result of this performance testing, this issue is to request that CQL3 > support batch mutation operations as a single operation (statement) to ensure > the same speed/performance benefits as existing Thrift clients. > Example suggested syntax (based on the above example table/column family): > {code} > insert into results (row_id, (index,value)) values > ((0,text0), (1,text1), (2,text2), ..., (N,textN)); > {code} > Each value in the {{values}} clause is a tuple. The first tuple element is > the column name, the second tuple element is the column value. This seems to > be the most simple/accurate representation of what happens during a batch > insert/mutate. > Not having this CQL feature forced us to remove the Datastax Java Driver > (which we liked) in favor of Astyanax because Astyanax supports this > behavior. We desire feature/performance parity between Thrift and > CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the > Driver. -- This message was sent by Atlassian JIRA (v6.2#6252)