On 21 February 2017 at 14:52, Dan Berindei <dan.berin...@gmail.com> wrote: > On Tue, Feb 21, 2017 at 10:39 AM, Sanne Grinovero <sa...@infinispan.org> > wrote: >> On 21 February 2017 at 07:37, Dan Berindei <dan.berin...@gmail.com> wrote: >>> On Mon, Feb 20, 2017 at 8:02 PM, Sanne Grinovero <sa...@infinispan.org> >>> wrote: >>>> -1 to batch removal >>>> >>>> Frankly I've always been extremely negative about the fact that >>>> batches are built on top of transactions. It's easy to find several >>>> iterations of rants of mine on this mailing list, especially fierce >>>> debates with Mircea. So I'd welcome a separation of these features. >>>> >>>> Yet, removing batching seems very wrong. I disagree that people should >>>> use Transactions instead; for many use cases it's overkill, and for >>>> others - and this is the main pain point I always had with the current >>>> design - it might make sense to have a batch (or more than one) within >>>> a transaction. >>>> I have had practical problems with needing to "flush" a single batch >>>> within a transaction as the size of the combined elements was getting >>>> too large. That doesn't imply at all that I'm ready to commit. >>>> >>> >>> WDYM by "flush" here? I have a feeling this is nothing like our >>> batching ever was... >> >> I'm just listing points about why incorporating the batch concept with >> transactions is not practical. >> >> I mean that when one has to write very large amounts of data, you need to >> break it up in smaller batches; *essentially* to flush the first batch before >> starting the second one. >> So that is unrelated with atomicity and consistency, as in practice one has >> to split one business operation into multiple batches. >> > > Kind of repeating Radim's question, but how is this better than having > multiple small transactions and committing each one separately?
If you assume that code integrating with Infinispan is in control of the transaction boundaries - for example able to decide when it's time to commit - it implies you can not integrate with other transactional components. Which defeats the purpose of exposing JTA integration. > >>> >>>> @Pedro: the fact that some code is broken today is not relevant, when >>>> there's need for such features. Like you suggest, it had bad premises >>>> (build it on TX) so we should address that, but not throw it all out. >>>> >>> >>> Infinispan never created nested batches: >> >> I know. I'm not describing the current implementation, I'm describing use >> cases which are not addressed, or in which Infinispan is clumsy to use, >> to suggest improvements. >> And I'm not asking to have nested batches. Again, just pointing >> out practical problems with the current batch design. >> >> It should be possible to run an efficient batch of operations within a >> transaction. >> Or a sequence of batches, all within the same transaction. >> N.B. you could see that as a "nested batch" in the current twisted idea that >> a batch is a transaction - which is what I'm arguing against. >> > > I'm sure there there are use cases that we don't address properly, but > I don't know enough about those use cases or your proposed batching > API improvements to really say anything about them. The only thing I'm > confident about is that the current batching API is almost certainly > not the answer. > >> >>> calling startBatch() when a >>> batch was already associated with the current thread just incremented >>> a refcount, and only the final endBatch() did any work. OTOH running a >>> batch within a transaction always worked very much like suspending the >>> current transaction, starting a new one, and committing it on >>> endBatch(). So the only real difference between batching and using >>> DummyTransactionManager is that batching is limited to one cache's >>> operations, while DummyTransactionManager supports multiple resources. >>> >>> >>>> @Bela is making spot-on objections based on use cases, which need to >>>> be addressed in some way. The "atomical" operations still don't work >>>> right[1] in Infinispan and that's a big usability problem. >>>> >>> >>> Batching never was about sending updates asynchronously. We have >>> putAllAsync() for that, which doesn't need transactions, and it's even >>> slightly more efficient without transactions. >> >> If you think this way, it sounds like you give no value to the application >> performance: people need more than a couple put operations. >> > > Unfortunately, going beyond simple put operations, e.g. conditional > writes, is exactly where asynchronous replication fails. Even doing > simple put operations and making sure that those values are written to > the cache is an impossible task with asynchronous replication. > Considering that the batch methods are called startAtomic() and > endAtomic() in TreeCache, I really don't think they were created with > asynchronous replication in mind. > > Off-topic: The BatchingCache#startBatch() javadoc was never true for > distributed caches: writes are queued, but reads (or puts without > IGNORE_RETURN_VALUES) always result in a remote call. Locks cause > remote calls in a cache with pessimistic locking, too, although one > could argue that back in version 4.0, locks were indeed acquired > locally during the write and remotely at the end of the batch. > >>> >>> And atomical operations have bugs, yes, but I'm not sure how >>> implementing a new kind of batching that isn't based on transactions >>> would help with that. >>> >>> >>>> +1 to remove async TX >>>> >>>> I actually like the concept but the API should be different.. might as >>>> well remove this for now. >>>> >>>> +1 to remove the Tree module >>>> >>>> I personally never used it as you all advised against, yet it's been >>>> often requested by users; sometimes explicitly and others not so >>>> explicit, yet people often have need which would be solved by a good >>>> Tree module. >>>> I understand the reasons you all want to remove it: it's buggy. But I >>>> believe the real reason is that it should have been built on top of a >>>> proper batch API, and using real MVCC. In that case it wouldn't have >>>> been buggy, nor too hard to maintain, and might have attracted way >>>> more interest as well. >>> >>> I think the fact that we haven't been able to build a "proper" batch >>> API using real MVCC yet is a proof to the contrary... >>> >>> >>>> I'd remove it as a temporary measure: delete the bad stuff, but >>>> hopefully it could be reintroduced built on better principles in some >>>> future? >>>> >>>> Thanks, >>>> Sanne >>>> >>>> [1] - "right" as in user expectations and actual practical use, which >>>> is currently different than in the twisted definition of "right" that >>>> the team has been using as an excuse ;-) I'll also proof that it >>>> doesn't work right according to your own twisted specs, when I find >>>> the time to finish some tests.. >>>> >>>> >>>> On 20 February 2017 at 16:48, Pedro Ruivo <pe...@infinispan.org> wrote: >>>>> >>>>> >>>>> On 20-02-2017 16:12, Bela Ban wrote: >>>>>> >>>>>> >>>>>> On 20/02/17 17:06, Tristan Tarrant wrote: >>>>>>> Hi guys, we discussed about this a little bit in the past and this >>>>>>> morning on IRC. Here are some proposed removals: >>>>>>> >>>>>>> - Remove the async transactional modes, as they are quite pointless >>>>>>> - Remove batching: users should use transactions >>>>>> >>>>>> How do you make a bunch of modifications and send them asynchronously if >>>>>> both batching *and* async TXs are getting removed? >>>>> >>>>> We are not removing features, we are removing broken code. >>>>> >>>>> Batching is using transactions and async transactions doesn't make sense >>>>> since infinispan has to report to TransactionManager. >>>>> >>>>> Our current asyn-tx is broken in a way that is starts to commit and >>>>> reports OK to the transaction manager. if you have a topology change or >>>>> a conflict, you will end with inconsistent data. >>>>> >>>>> So, why do we keeping this code around? >>>>> >>>>> you can still move a transaction commit to another thread if you don't >>>>> wanna wait for its commit: >>>>> >>>>> tm.begin() >>>>> doWork() >>>>> tx = tm.suspend() >>>>> new_thread { >>>>> tm.resume(tx) >>>>> tm.commit() >>>>> } >>>>> >>>>> The best thing I can think of is to keep the batching API and >>>>> re-implement it to provide an endBatchAsync() that will do the above. >>>>> >>>>>> >>>>>> So if someone wants to apply a unit of work *atomically* (either all >>>>>> modifications within that unit of work are applied, or none), what >>>>>> alternatives exist? >>>>>> >>>>>>> - Remove the tree module: it doesn't work properly, and uses batching >>>>>>> >>>>>>> Please cast your votes >>>>>>> >>>>>>> Tristan >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev