[
https://issues.apache.org/jira/browse/CASSANDRA-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334432#comment-14334432
]
Jay Patel commented on CASSANDRA-8854:
--------------------------------------
I agree that it feels like it’s breaking the concept of batch.
Looking from a different angle, logged batch in C* is actually eventually
atomic. In most cases, I think, use case does not demand to read the data
written by statements in logged batch immediately since the batch may be
partially executed. The only difference between successfully executed “sync" vs
“async" logged batch is that with “sync”, we'll know the failed statements
upfront which will be successful eventually anyway, whereas; with “async",
we’ll not know the failed "async" statements upfront. I think knowing which
statements failed in sync logged batch won't help much to application for
dramatically changing its course of action. So, I feel having async option in
logged batch will be very useful (for almost all logged batch use cases) from
performance standpoint without loosing atomicity or any other features of
logged batches. Pls. correct me if I misunderstood anything about logged
batches.
Client side batch log also sounds like a good idea. But, it may be harder for
client side "batchlog" to guarantee the same level of atomicity provided by C*
logged batches (C* logged batch is more closer to C*). Another issue is that
client side "batchlog" table/file needs to be persisted in some
database/filesystem which will not be able to scale same as C* & can become
bottleneck soon.
Other than this, seems like batch statements are executed sequentially. If so,
is it possible to provide an option to execute them in parallel (the first
statement as sequential and the rest as parallel, if the first is successful)?
I can tract this as an another ticket and look into it.
Thanks!
> Support for Async Atomic Batch
> ------------------------------
>
> Key: CASSANDRA-8854
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8854
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jay Patel
>
> Use case sometimes demands atomicity (using C* logged batch) across multiple
> DML statements; however, in order to minimize the end user latency, do not
> want to wait for all the statements to be executed.
> For instance, would like to have something like:
> BEGIN BATCH
> Sync - INSERT INTO users (userID, name, email) VALUES ('user1', ‘first
> user’, ’[email protected]')
> Async - INSERT INTO users_by_name (name, userID) VALUES (‘first user’,
> 'user1’);
> Async - INSERT INTO users_by_email (name, userID) VALUES
> (’[email protected]’, 'user1’);
> ..... more Async statements!
> APPLY BATCH;
> Once the batch is serialized to the batchlog table and the sync statements
> are executed, coordinator should return response without waiting for
> execution of async batch statements.
> Some of the use cases that we’re working on will get benefited significantly
> in terms of latency reduction. I can take a first cut at it if we don’t see
> any concerns supporting it.
> Also, need some discussions around specifying sync/async tag for each
> statement in the batch.
> Thoughts welcome. Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)