[ 
https://issues.apache.org/jira/browse/CASSANDRA-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334432#comment-14334432
 ] 

Jay Patel commented on CASSANDRA-8854:
--------------------------------------

I agree that it feels like it’s breaking the concept of batch. 
Looking from a different angle, logged batch in C* is actually eventually 
atomic. In most cases, I think, use case does not demand to read the data 
written by statements in logged batch immediately since the batch may be 
partially executed. The only difference between successfully executed “sync" vs 
“async" logged batch is that with “sync”, we'll know the failed statements 
upfront which will be successful eventually anyway, whereas; with “async", 
we’ll not know the failed "async" statements upfront. I think knowing which 
statements failed in sync logged batch won't help much to application for 
dramatically changing its course of action. So, I feel having async option in 
logged batch will be very useful (for almost all logged batch use cases) from 
performance standpoint without loosing atomicity or any other features of 
logged batches. Pls. correct me if I misunderstood anything about logged 
batches.

Client side batch log also sounds like a good idea. But, it may be harder for 
client side "batchlog" to guarantee the same level of atomicity provided by C* 
logged batches (C* logged batch is more closer to C*). Another issue is that 
client side "batchlog" table/file needs to be persisted in some 
database/filesystem which will not be able to scale same as C* & can become 
bottleneck soon.

Other than this, seems like batch statements are executed sequentially. If so, 
is it possible to provide an option to execute them in parallel (the first 
statement as sequential and the rest as parallel, if the first is successful)? 
I can tract this as an another ticket and look into it.

Thanks!

> Support for Async Atomic Batch
> ------------------------------
>
>                 Key: CASSANDRA-8854
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8854
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jay Patel
>
> Use case sometimes demands atomicity (using C* logged batch) across multiple 
> DML statements; however, in order to minimize the end user latency, do not 
> want to wait for all the statements to be executed. 
> For instance, would like to have something like:
> BEGIN BATCH
>   Sync - INSERT INTO users (userID, name, email) VALUES ('user1', ‘first 
> user’, ’[email protected]')
>   Async - INSERT INTO users_by_name (name, userID) VALUES (‘first user’, 
> 'user1’); 
>   Async -  INSERT INTO users_by_email (name, userID) VALUES 
> (’[email protected]’, 'user1’);
>   ..... more Async statements!
> APPLY BATCH;
> Once the batch is serialized to the batchlog table and the sync statements 
> are executed, coordinator should return response without waiting for 
> execution of async batch statements.
> Some of the use cases that we’re working on will get benefited significantly 
> in terms of latency reduction. I can take a first cut at it if we don’t see 
> any concerns supporting it. 
> Also, need some discussions around specifying sync/async tag for each 
> statement in the batch.
> Thoughts welcome. Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to