[jira] [Comment Edited] (IGNITE-3588) Cassandra store should use batching in writeAll and deleteAll methods

Igor Rudyak (JIRA) Tue, 26 Jul 2016 14:53:26 -0700

    [ 
https://issues.apache.org/jira/browse/IGNITE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394608#comment-15394608
 ]


Igor Rudyak edited comment on IGNITE-3588 at 7/26/16 9:51 PM:
--------------------------------------------------------------

Valentin,

For *writeAll/readAll* Cassandra cache store implementation uses async 
operations (http://www.datastax.com/dev/blog/java-driver-async-queries) and 
futures, which has the best characteristics in terms of performance. 

Cassandra BATCH statement is actually quite often anti-pattern for those who 
come from relational world. BATCH statement concept in Cassandra is totally 
different from relational world and is not for optimizing batch/bulk 
operations. The main purpose of Cassandra BATCH is to keep denormalized data in 
sync. For example when you duplicating the same data into several tables. All 
other cases are not recommended for Cassandra batches: 
 - 
https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
 - 
http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
 - https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/

It's also good to mention that in CassandraCacheStore implementation (actually 
in CassandraSessionImpl) all operation with Cassandra is wrapped in a loop. The 
reason is in a case of failure it will be performed 20 attempts to retry the 
operation with incrementally increasing timeouts starting from 100ms and 
specific exception handling logic (Cassandra hosts unavailability and etc.). 
Thus it provides quite reliable persistence mechanism. According to load tests, 
even on heavily overloaded Cassandra cluster (CPU LOAD > 10 per one core) there 
were no lost writes/reads/deletes and maximum 6 attempts to perform one 
operation.


was (Author: irudyak):
Valentin,

For writeAll/readAll Cassandra cache store implementation uses async operations 
(http://www.datastax.com/dev/blog/java-driver-async-queries) and futures, which 
has the best characteristics in terms of performance. 

Cassandra BATCH statement is actually quite often anti-pattern for those who 
come from relational world. BATCH statement concept in Cassandra is totally 
different from relational world and is not for optimizing batch/bulk 
operations. The main purpose of Cassandra BATCH is to keep denormalized data in 
sync. For example when you duplicating the same data into several tables. All 
other cases are not recommended for Cassandra batches: 
 - 
https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
 - 
http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
 - https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/

It's also good to mention that in CassandraCacheStore implementation (actually 
in CassandraSessionImpl) all operation with Cassandra is wrapped in a loop. The 
reason is in a case of failure it will be performed 20 attempts to retry the 
operation with incrementally increasing timeouts starting from 100ms and 
specific exception handling logic (Cassandra hosts unavailability and etc.). 
Thus it provides quite reliable persistence mechanism. According to load tests, 
even on heavily overloaded Cassandra cluster (CPU LOAD > 10 per one core) there 
were no lost writes/reads/deletes and maximum 6 attempts to perform one 
operation.

> Cassandra store should use batching in writeAll and deleteAll methods
> ---------------------------------------------------------------------
>
>                 Key: IGNITE-3588
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3588
>             Project: Ignite
>          Issue Type: Improvement
>          Components: ignite-cassandra
>    Affects Versions: 1.6
>            Reporter: Valentin Kulichenko
>             Fix For: 1.7
>
>
> In current implementation Cassandra store executes all updates one by one 
> when {{writeAll}} or {{deleteAll}} method is called.
> We should add an option to use {{BatchStatement}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (IGNITE-3588) Cassandra store should use batching in writeAll and deleteAll methods

Reply via email to