Hi, Kristian! On Sep 06, Kristian Nielsen wrote: > However, as I revisited the algorithm, it occured to me that it is in > any case better to wake up threads individually as soon as > commit_ordered() has finished. This way, the first threads in the > queue are free to continue doing useful work while we are still > running commit_ordered() for the last threads. > > So now the algorithm is something like this: > > thd->ready= false > lock(LOCK_prepare_ordered) > old_queue= group_commit_queue > thd->next= old_queue > group_commit_queue= thd > ht->prepare_ordered() > unlock(LOCK_prepare_ordered) > > if (old_queue == NULL) // leader? > lock(LOCK_group_commit) > > lock(LOCK_prepare_ordered) > queue= reverse(group_commit_queue) > group_commit_queue= NULL > unlock(LOCK_prepare_ordered) > > group_log_xid(queue) > > lock(LOCK_commit_ordered) // but see below > unlock(LOCK_group_commit) > for thd2 in <queue> > lock(thd2->LOCK_wakeup) > thd2->ready= true > signal(thd2->COND_wakeup) > unlock(thd2->LOCK_wakeup) > unlock(LOCK_commit_ordered) > else > lock (thd->LOCK_wakeup) > while (!thd->ready) > wait(COND_wakeup, LOCK_wakeup) > unlock (thd->LOCK_wakeup) > > cookie= xid_log_after()
Where in this algorithm you call ht->commit_ordered() ? > On the other hand, the algorithm I suggested earlier for START > TRANSACTION WITH CONSISTENT SNAPSHOT used the LOCK_commit_ordered, and > there might be other uses... > > So I am not sure. I'd like to think more about it, or what do you > think? START TRANSACTION WITH CONSISTENT SNAPSHOT is a good reason to keep the mutex. > >> It would be possible to iterate over the queue to invoke > >> prepare_ordered() in sequence from a single thread, just like > >> group_log_xid() and commit_ordered(). But this would delay the > >> calls until the previous group commit is done and the next one > >> starts > > > > No, why ? You only need one atomic fetch_and_store to copy the queue > > head to a local variable and reset the queue. Then you can call > > prepare_ordered or commit_ordered in the queue order without any > > mutex. > > I am not sure if I understood your suggestion correctly. But what I > considered with the above statement about "delaying calls to > prepare_ordered()" is this: > > Just like the group leader thread runs commit_ordered() in a loop over > the queue just after group_log_xid(), we could have it do a similar > loop for prepare_ordered() just before group_log_xid(). Yes. > But I choose to do it earlier, as soon as the transaction is put in > the queue and commit order thereby defined. > > There can be quite a "long" time interval between these two events: > the time it takes for the previous group_log_xid() (eg. an fsync()), > plus sometimes one wants to add extra sleeps in group commit to group > more transactions together. No. The long interval is *inside* the group_log_xid(), while you call prepare_ordered() *before* it. But anyway, the LOCK_prepare_ordered mutex is not going to be contented, so removing it by using a lock-free queue (that's what this second approach is about) will not bring any noticeable benefits. > The main performance bottleneck I am introducing is, I think, the > serialisation of the commit_ordered() part of commit. Not just for > some particular engine implementation, but for the interface. That is > not a decision to be taken lightly. > > Of course, compared to InnoDB today, it's much better, as it gets rid > of the InnoDB prepare_commit_mutex (which spans all the way from end > of prepare() to end of what is effectively commit_ordered()), and also > avoids taking LOCK_log over all of log_xid() in the binlog. > > But for something like NDB, I think serialised commit order would > really hurt (if it even makes sense ...) > > Maybe the answer here is that engines can choose to support > commit_ordered() or not (and NDB-like engines probably will not). And > if not, there is just no support for consistent commit order. > > And if we implement the simple way to recover engines from binlog > without fsync() in prepare() and commit(), then it will only work for > engines supporting commit_ordered(). Later we could implement the more > complex recovery without need for commit_ordered() support. It's reasonable to say that if an engine does not implement commit_ordered() then it needs to take care of its own recovery and fsync both in prepare and commit. Regards, Sergei _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp