[
https://issues.apache.org/jira/browse/IGNITE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394608#comment-15394608
]
Igor Rudyak edited comment on IGNITE-3588 at 7/26/16 9:51 PM:
--------------------------------------------------------------
Valentin,
For *writeAll/readAll* Cassandra cache store implementation uses async
operations (http://www.datastax.com/dev/blog/java-driver-async-queries) and
futures, which has the best characteristics in terms of performance.
Cassandra BATCH statement is actually quite often anti-pattern for those who
come from relational world. BATCH statement concept in Cassandra is totally
different from relational world and is not for optimizing batch/bulk
operations. The main purpose of Cassandra BATCH is to keep denormalized data in
sync. For example when you duplicating the same data into several tables. All
other cases are not recommended for Cassandra batches:
-
https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
-
http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
- https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/
It's also good to mention that in CassandraCacheStore implementation (actually
in CassandraSessionImpl) all operation with Cassandra is wrapped in a loop. The
reason is in a case of failure it will be performed 20 attempts to retry the
operation with incrementally increasing timeouts starting from 100ms and
specific exception handling logic (Cassandra hosts unavailability and etc.).
Thus it provides quite reliable persistence mechanism. According to load tests,
even on heavily overloaded Cassandra cluster (CPU LOAD > 10 per one core) there
were no lost writes/reads/deletes and maximum 6 attempts to perform one
operation.
was (Author: irudyak):
Valentin,
For writeAll/readAll Cassandra cache store implementation uses async operations
(http://www.datastax.com/dev/blog/java-driver-async-queries) and futures, which
has the best characteristics in terms of performance.
Cassandra BATCH statement is actually quite often anti-pattern for those who
come from relational world. BATCH statement concept in Cassandra is totally
different from relational world and is not for optimizing batch/bulk
operations. The main purpose of Cassandra BATCH is to keep denormalized data in
sync. For example when you duplicating the same data into several tables. All
other cases are not recommended for Cassandra batches:
-
https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij
-
http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html
- https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/
It's also good to mention that in CassandraCacheStore implementation (actually
in CassandraSessionImpl) all operation with Cassandra is wrapped in a loop. The
reason is in a case of failure it will be performed 20 attempts to retry the
operation with incrementally increasing timeouts starting from 100ms and
specific exception handling logic (Cassandra hosts unavailability and etc.).
Thus it provides quite reliable persistence mechanism. According to load tests,
even on heavily overloaded Cassandra cluster (CPU LOAD > 10 per one core) there
were no lost writes/reads/deletes and maximum 6 attempts to perform one
operation.
> Cassandra store should use batching in writeAll and deleteAll methods
> ---------------------------------------------------------------------
>
> Key: IGNITE-3588
> URL: https://issues.apache.org/jira/browse/IGNITE-3588
> Project: Ignite
> Issue Type: Improvement
> Components: ignite-cassandra
> Affects Versions: 1.6
> Reporter: Valentin Kulichenko
> Fix For: 1.7
>
>
> In current implementation Cassandra store executes all updates one by one
> when {{writeAll}} or {{deleteAll}} method is called.
> We should add an option to use {{BatchStatement}} instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)