Hi Valentin, Sounds reasonable. I'll create a ticket to add Cassandra logged batches and will try to prepare some load tests to investigate if unlogged batches can provide better performance. Will also add ticket for RAMP as a long term enhancement.
Igor Rudyak On Fri, Jul 29, 2016 at 5:45 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hi Igor, > > 1) Yes, I'm talking about splitting the entry set into per-partition (or > per-node) batches. Having entries that are stores on different nodes in the > same batch doesn't make much sense, of course. > > 2) RAMP looks interesting, but it seems to be a pretty complicated task. > How about adding the support for built-in logged batches (this should be > fairly easy to implement) and then improve the atomicity as a second phase? > > -Val > > On Fri, Jul 29, 2016 at 5:19 PM, Igor Rudyak <irud...@gmail.com> wrote: > >> Hi Valentin, >> >> 1) According unlogged batches I think it doesn't make sense to support >> them, cause: >> - They are deprecated starting from Cassandra 3.0 (which we are currently >> using in Cassandra module) >> - According to Cassandra documentation ( >> http://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html) >> "Batches are often mistakenly used in an attempt to optimize performance". >> Cassandra guys saying that no batches ( >> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.rxkmfe209) >> is the fastest way to load data. I checked it with the batches having >> records with different partition keys and it's definitely true. For small >> batch of records having all the same partition key (affinity in Ignite) >> they could provide better performance, but I didn't investigated this case >> deeply (what is the optimal size of a batch, how significantly is the >> performance benefits and etc.) Can try to do some load tests to have better >> understanding of this. >> >> 2) Regarding logged batches I think that it makes sense to support them >> in Cassandra module for transactional caches. The bad thing is that they >> don't provide isolation, the good thing is they guaranty that all your >> changes will be eventually committed and visible to clients. Thus it's >> still better than nothing... However there is a better approach for this. >> We can implement transactional protocol on top of Cassandra, which will >> give us atomic read isolation - you'll either see all the changes made by >> transaction or none of them. For example we can implement RAMP transactions( >> http://www.bailis.org/papers/ramp-sigmod2014.pdf) cause it provides >> rather low overhead. >> >> Igor Rudyak >> >> On Thu, Jul 28, 2016 at 11:00 PM, Valentin Kulichenko < >> valentin.kuliche...@gmail.com> wrote: >> >>> Hi Igor, >>> >>> I'm not a big Cassandra expert, but here are my thoughts. >>> >>> 1. Sending updates in a batch is always better than sending them one by >>> one. For example, if you do putAll in Ignite with 100 entries, and these >>> entries are split across 5 nodes, the client will send 5 requests instead >>> of 100. This provides significant performance improvement. Is there a way >>> to use similar approach in Cassandra? >>> 2. As for logged batches, I can easily believe that this is a rarely >>> used feature, but since it exists in Cassandra, I can't find a single >>> reason why not to support it in our store as an option. Users that come >>> across those rare cases, will only say thank you to us :) >>> >>> What do you think? >>> >>> -Val >>> >>> On Thu, Jul 28, 2016 at 10:41 PM, Igor Rudyak <irud...@gmail.com> wrote: >>> >>>> There are actually some cases when atomic read isolation in Cassandra >>>> could >>>> be important. Lets assume batch was persisted in Cassandra, but not >>>> finalized yet - read operation from Cassandra returns us only partially >>>> committed data of the batch. In the such situation we have problems >>>> when: >>>> >>>> 1) Some of the batch records already expired from Ignite cache and we >>>> reading them from persistent store (Cassandra in our case). >>>> >>>> 2) All Ignite nodes storing the batch records (or subset records) died >>>> (or >>>> for example became unavailable for 10sec because of network problem). >>>> While >>>> reading such records from Ignite cache we will be redirected to >>>> persistent >>>> store. >>>> >>>> 3) Network separation occurred such a way that we now have two Ignite >>>> cluster, but all the replicas of the batch data are located only in one >>>> of >>>> these clusters. Again while reading such records from Ignite cache on >>>> the >>>> second cluster we will be redirected to persistent store. >>>> >>>> In all mentioned cases, if Cassandra batch isn't finalized yet - we will >>>> read partially committed transaction data. >>>> >>>> >>>> On Thu, Jul 28, 2016 at 6:52 AM, Luiz Felipe Trevisan < >>>> luizfelipe.trevi...@gmail.com> wrote: >>>> >>>> > I totally agree with you regarding the guarantees we have with logged >>>> > batches and I'm also pretty much aware of the performance penalty >>>> involved >>>> > using this solution. >>>> > >>>> > But since all read operations are executed via ignite it means that >>>> > isolation in the Cassandra level is not really important. I think the >>>> only >>>> > guarantee really needed is that we don't end up with a partial insert >>>> in >>>> > Cassandra in case we have a failure in ignite and we loose the node >>>> that >>>> > was responsible for this write operation. >>>> > >>>> > My other assumption is that the write operation needs to finish >>>> before an >>>> > eviction happens for this entry and we loose the data in cache (since >>>> batch >>>> > doesn't guarantee isolation). However if we cannot achieve this I >>>> don't see >>>> > why use ignite as a cache store. >>>> > >>>> > Luiz >>>> > >>>> > -- >>>> > Luiz Felipe Trevisan >>>> > >>>> > On Wed, Jul 27, 2016 at 4:55 PM, Igor Rudyak <irud...@gmail.com> >>>> wrote: >>>> > >>>> >> Hi Luiz, >>>> >> >>>> >> Logged batches is not the solution to achieve atomic view of your >>>> Ignite >>>> >> transaction changes in Cassandra. >>>> >> >>>> >> The problem with logged batches(aka atomic) is they guarantees that >>>> if >>>> >> any part of the batch succeeds, all of it will, no other >>>> transactional >>>> >> enforcement is done at the batch level. For example, there is no >>>> batch >>>> >> isolation. Clients are able to read the first updated rows from the >>>> batch, >>>> >> while other rows are still being updated on the server (in RDBMS >>>> >> terminology it means *READ-UNCOMMITED* isolation level). Thus >>>> Cassandra >>>> >>>> >> mean "atomic" in the database sense that if any part of the batch >>>> succeeds, >>>> >> all of it will. >>>> >> >>>> >> Probably the best way to archive read atomic isolation for Ignite >>>> >> transaction persisting data into Cassandra, is to implement RAMP >>>> >> transactions (http://www.bailis.org/papers/ramp-sigmod2014.pdf) on >>>> top >>>> >> of Cassandra. >>>> >> >>>> >> I may create a ticket for this if community would like it. >>>> >> >>>> >> >>>> >> Igor Rudyak >>>> >> >>>> >> >>>> >> On Wed, Jul 27, 2016 at 12:55 PM, Luiz Felipe Trevisan < >>>> >> luizfelipe.trevi...@gmail.com> wrote: >>>> >> >>>> >>> Hi Igor, >>>> >>> >>>> >>> Does it make sense for you using logged batches to guarantee >>>> atomicity >>>> >>> in Cassandra in cases we are doing a cross cache transaction >>>> operation? >>>> >>> >>>> >>> Luiz >>>> >>> >>>> >>> -- >>>> >>> Luiz Felipe Trevisan >>>> >>> >>>> >>> On Wed, Jul 27, 2016 at 2:05 AM, Dmitriy Setrakyan < >>>> >>> dsetrak...@apache.org> wrote: >>>> >>> >>>> >>>> I am very confused still. Ilya, can you please explain what >>>> happens in >>>> >>>> Cassandra if user calls IgniteCache.putAll(...) method? >>>> >>>> >>>> >>>> In Ignite, if putAll(...) is called, Ignite will make the best >>>> effort to >>>> >>>> execute the update as a batch, in which case the performance is >>>> better. >>>> >>>> What is the analogy in Cassandra? >>>> >>>> >>>> >>>> D. >>>> >>>> >>>> >>>> On Tue, Jul 26, 2016 at 9:16 PM, Igor Rudyak <irud...@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> > Dmitriy, >>>> >>>> > >>>> >>>> > There is absolutely same approach for all async read/write/delete >>>> >>>> > operations - Cassandra session just provides >>>> executeAsync(statement) >>>> >>>> > function >>>> >>>> > for all type of operations. >>>> >>>> > >>>> >>>> > To be more detailed about Cassandra batches, there are actually >>>> two >>>> >>>> types >>>> >>>> > of batches: >>>> >>>> > >>>> >>>> > 1) *Logged batch* (aka atomic) - the main purpose of such >>>> batches is >>>> >>>> to >>>> >>>> > keep duplicated data in sync while updating multiple tables, but >>>> at >>>> >>>> the >>>> >>>> > cost of performance. >>>> >>>> > >>>> >>>> > 2) *Unlogged batch* - the only specific case for such batch is >>>> when >>>> >>>> all >>>> >>>> > updates are addressed to only *one* partition key and batch >>>> having >>>> >>>> > "*reasonable >>>> >>>> > size*". In a such situation there *could be* performance >>>> benefits if >>>> >>>> you >>>> >>>> > are using Cassandra *TokenAware* load balancing policy. In this >>>> >>>> particular >>>> >>>> > case all the updates will go directly without any additional >>>> >>>> > coordination to the primary node, which is responsible for >>>> storing >>>> >>>> data for >>>> >>>> > this partition key. >>>> >>>> > >>>> >>>> > The *generic rule* is that - *individual updates using async >>>> mode* >>>> >>>> provides >>>> >>>> > the best performance ( >>>> >>>> > https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html >>>> ). >>>> >>>> That's >>>> >>>> > because it spread all updates across the whole cluster. In >>>> contrast to >>>> >>>> > this, when you are using batches, what this is actually doing is >>>> >>>> putting a >>>> >>>> > huge amount of pressure on a single coordinator node. This is >>>> because >>>> >>>> the >>>> >>>> > coordinator needs to forward each individual >>>> insert/update/delete to >>>> >>>> the >>>> >>>> > correct replicas. In general you're just losing all the benefit >>>> of >>>> >>>> > Cassandra TokenAware load balancing policy when you're updating >>>> >>>> different >>>> >>>> > partitions in a single round trip to the database. >>>> >>>> > >>>> >>>> > Probably the only enhancement which could be done is to separate >>>> our >>>> >>>> batch >>>> >>>> > to smaller batches, each of which is updating records having the >>>> same >>>> >>>> > partition key. In this case it could provide some performance >>>> >>>> benefits when >>>> >>>> > used in combination with Cassandra TokenAware policy. But there >>>> are >>>> >>>> several >>>> >>>> > concerns: >>>> >>>> > >>>> >>>> > 1) It looks like rather rare case >>>> >>>> > 2) Makes error handling more complex - you just don't know what >>>> >>>> operations >>>> >>>> > in a batch succeed and what failed and need to retry all batch >>>> >>>> > 3) Retry logic could produce more load on the cluster - in case >>>> of >>>> >>>> > individual updates you just need to retry the only mutations >>>> which are >>>> >>>> > failed, in case of batches you need to retry the whole batch >>>> >>>> > 4)* Unlogged batch is deprecated in Cassandra 3.0* ( >>>> >>>> > >>>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html), >>>> >>>> > which >>>> >>>> > we are currently using for Ignite Cassandra module. >>>> >>>> > >>>> >>>> > >>>> >>>> > Igor Rudyak >>>> >>>> > >>>> >>>> > >>>> >>>> > >>>> >>>> > On Tue, Jul 26, 2016 at 4:45 PM, Dmitriy Setrakyan < >>>> >>>> dsetrak...@apache.org> >>>> >>>> > wrote: >>>> >>>> > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > On Tue, Jul 26, 2016 at 5:53 PM, Igor Rudyak < >>>> irud...@gmail.com> >>>> >>>> wrote: >>>> >>>> > > >>>> >>>> > >> Hi Valentin, >>>> >>>> > >> >>>> >>>> > >> For writeAll/readAll Cassandra cache store implementation uses >>>> >>>> async >>>> >>>> > >> operations ( >>>> >>>> http://www.datastax.com/dev/blog/java-driver-async-queries) >>>> >>>> > >> and >>>> >>>> > >> futures, which has the best characteristics in terms of >>>> >>>> performance. >>>> >>>> > >> >>>> >>>> > >> >>>> >>>> > > Thanks, Igor. This link describes the query operations, but I >>>> could >>>> >>>> not >>>> >>>> > > find the mention of writes. >>>> >>>> > > >>>> >>>> > > >>>> >>>> > >> Cassandra BATCH statement is actually quite often >>>> anti-pattern for >>>> >>>> those >>>> >>>> > >> who come from relational world. BATCH statement concept in >>>> >>>> Cassandra is >>>> >>>> > >> totally different from relational world and is not for >>>> optimizing >>>> >>>> > >> batch/bulk operations. The main purpose of Cassandra BATCH is >>>> to >>>> >>>> keep >>>> >>>> > >> denormalized data in sync. For example when you duplicating >>>> the >>>> >>>> same >>>> >>>> > data >>>> >>>> > >> into several tables. All other cases are not recommended for >>>> >>>> Cassandra >>>> >>>> > >> batches: >>>> >>>> > >> - >>>> >>>> > >> >>>> >>>> > >> >>>> >>>> > >>>> >>>> >>>> https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.k4xfir8ij >>>> >>>> > >> - >>>> >>>> > >> >>>> >>>> > >> >>>> >>>> > >>>> >>>> >>>> http://christopher-batey.blogspot.com/2015/02/cassandra-anti-pattern-misuse-of.html >>>> >>>> > >> - >>>> >>>> >>>> https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-to-batch/ >>>> >>>> > >> >>>> >>>> > >> It's also good to mention that in CassandraCacheStore >>>> >>>> implementation >>>> >>>> > >> (actually in CassandraSessionImpl) all operation with >>>> Cassandra is >>>> >>>> > wrapped >>>> >>>> > >> in a loop. The reason is in a case of failure it will be >>>> performed >>>> >>>> 20 >>>> >>>> > >> attempts to retry the operation with incrementally increasing >>>> >>>> timeouts >>>> >>>> > >> starting from 100ms and specific exception handling logic >>>> >>>> (Cassandra >>>> >>>> > hosts >>>> >>>> > >> unavailability and etc.). Thus it provides quite reliable >>>> >>>> persistence >>>> >>>> > >> mechanism. According to load tests, even on heavily overloaded >>>> >>>> Cassandra >>>> >>>> > >> cluster (CPU LOAD > 10 per one core) there were no lost >>>> >>>> > >> writes/reads/deletes and maximum 6 attempts to perform one >>>> >>>> operation. >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > > I think that the main point about Cassandra batch operations >>>> is not >>>> >>>> about >>>> >>>> > > reliability, but about performance. If user batches up 100s of >>>> >>>> updates >>>> >>>> > in 1 >>>> >>>> > > Cassandra batch, then it will be a lot faster than doing them >>>> >>>> 1-by-1 in >>>> >>>> > > Ignite. Wrapping them into Ignite "putAll(...)" call just >>>> seems more >>>> >>>> > > logical to me, no? >>>> >>>> > > >>>> >>>> > > >>>> >>>> > >> >>>> >>>> > >> Igor Rudyak >>>> >>>> > >> >>>> >>>> > >> On Tue, Jul 26, 2016 at 1:58 PM, Valentin Kulichenko < >>>> >>>> > >> valentin.kuliche...@gmail.com> wrote: >>>> >>>> > >> >>>> >>>> > >> > Hi Igor, >>>> >>>> > >> > >>>> >>>> > >> > I noticed that current Cassandra store implementation >>>> doesn't >>>> >>>> support >>>> >>>> > >> > batching for writeAll and deleteAll methods, it simply >>>> executes >>>> >>>> all >>>> >>>> > >> updates >>>> >>>> > >> > one by one (asynchronously in parallel). >>>> >>>> > >> > >>>> >>>> > >> > I think it can be useful to provide such support and >>>> created a >>>> >>>> ticket >>>> >>>> > >> [1]. >>>> >>>> > >> > Can you please give your input on this? Does it make sense >>>> in >>>> >>>> your >>>> >>>> > >> opinion? >>>> >>>> > >> > >>>> >>>> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-3588 >>>> >>>> > >> > >>>> >>>> > >> > -Val >>>> >>>> > >> > >>>> >>>> > >> >>>> >>>> > > >>>> >>>> > > >>>> >>>> > >>>> >>>> >>>> >>> >>>> >>> >>>> >> >>>> > >>>> >>> >>> >> >