Sergei Golubchik <s...@mariadb.org> writes: >> Note that at this point, the group commit leader thread _is_ >> effectively acting as if it was each of the participating threads, the >> other threads are blocked from running during commit_ordered. > > I'm not sure it's good enough. Practically it might work, but if one > thread has modified some variable under a mutex, you cannot simply read > it from another thread without a mutex, there's no guarantee that > another thread will see the new value. You'll need an appropriate memory > barrier for that.
Sure, I understand that, maybe I should have explained this code in more detail. This is the binlog group commit. We have N transactions doing commit at the same time: T1, T2, ..., TN. After writing to the binlog, we must run commit_ordered(T_i) for each transaction in order, i=1, 2, ..., N. The wait_for_prior_commit mechanism exists for this. Each thread will do: wait_for_prior_commit(T_{i-1}) commit_ordered(T_i) wakeup_subsequent_commits() But this is slow, since it requires thread scheduling between N threads, in-order: https://knielsen-hq.org/w/benchmarking-thread-scheduling-in-group-commit/ Instead, what we do is that T_1 will call commit_ordered() for _all_ the N transactions. Meanwhile the other threads are waiting to be signalled. Note that LOCK_wait_commit protects the hand-over from thread T_i to T_1 and back, so we _do_ have proper mutex protection. This mutex provides the necessary memory barrier between T_i modifying data and T_1 reading it, or T_1 modifying and T_i reading. Does that make it clear? >> If there is some reason, then the correct fix could be to set >> current_thd for the duration of run_commit_ordered, which will satisfy >> the assertion and presumably the actual bug? > > Setting current_thd is cheating, it will trick the assert, but won't fix > the underlying problem as outlined above. Nono, it's not cheating, it's in fact setting the correct value. While running commit_ordered(T_i), the current THD _is_ the THD for T_i, not the THD for T_1, which is unrelated to transaction T_i. It's already done for a similar case a bit earlier in the code, where T_1 is writing the binlog for each of the other threads: set_current_thd(current->thd); It's similar to the pool-of-threads. If a THD goes idle for some time, and then is resumed on a different thread in the thread pool, this does not mean that every access to the data must now be done under LOCK_thd_data. Rather, now the THD is owned by another thread, so this thread sets current_thd accordingly. The group commit code is similar, the THD is (temporarily) moved to be under a different thread. Does that make sense? > Monty asserted many times that taking an uncontended mutex is > cheap. Did you think it's expensive or did you benchmark that? I did, actually! (in the past). Taking a mutex needs to reserve the associated cache line, which is not free. "Expensive" is relative, of course. But here, LOCK_thd_data serves no purpose, and it's done under a very critical global lock LOCK_commit_ordered. So surely we should not be taking/releasing N extra mutexes here without any reason to do so? For the same reason, set_current_thd() was not done in this code, to avoid it if it serves no purpose. This is documented on the commit_ordered method in sql/handler.h: "Note that commit_ordered() can be called from a different thread than the one handling the transaction! So it can not do anything that depends on thread local storage" That is why I asked if you knew what the real problem is that triggered this assertion. There is no reason to add code in a performance-critical place if it serves no purpose. Worse, it might hide the real bug if some code was added to commit_ordered that violates the above requirement. I think it is fine to add set_current_thd() *if it is actually needed*. But we should understand first *why* it is needed, other wise we may hide the real bug. - Kristian. _______________________________________________ developers mailing list -- developers@lists.mariadb.org To unsubscribe send an email to developers-le...@lists.mariadb.org