Hi Sergei, On Sat, Sep 14, 2013 at 04:44:28PM +0200, Sergei Golubchik wrote: > Hi, Sergey! > > On Sep 13, Sergey Vojtovich wrote: > > Hi Sergei, > > > > comments inline and a question: 10.0 throughput is twice lower than 5.6 > > in a specific case. It is known to be caused by tc_acquire_table() and > > tc_release_table(). Do we want to fix it? If yes - how? > > How is it caused by tc_acquire_table/tc_release_table? Threads spend a lot of time waiting for LOCK_open in these functions. Because protected by LOCK_open code takes a lot of time to execute.
> In what specific case? The case is: many threads access one table (read-only OLTP). > > > > > > Why per-share lists are updated under the global mutex? > > > > Alas, it doesn't solve CPU cache coherence problem. > > > It doesn't solve CPU cache coherence problem, yes. > > > And it doesn't help if you have only one hot table. > > > But it certainly helps if many threads access many tables. > > Ok, let's agree to agree: it will help in certain cases. Most probably it > > won't improve situation much if all threads access single table. > > Of course. > > > We could try to ensure that per-share mutex is on the same cache line as > > free_tables and used_tables list heads. In this case I guess > > mysql_mutex_lock(&share->tdc.LOCK_table_share) will load list heads into > > CPU cache along with mutex structure. OTOH we still have to read per-TABLE > > prev/next pointers. And in 5.6 per-partition mutex should less frequently > > jump out of CPU cache than our per-share mutex. Worth trying? > > Did you benchmark that these cache misses are a problem? > What is the main problem that impacts the performance? We (Axel and me) did a lot of different benchmarks before we concluded cache misses to be the main problem. Please let me known if you're interested in specific results - we either find them in benchmark archives or benchmark again. One of interesting results I just found is as following... 10.0.4, read-only OLTP, 64 threads, tps ~10000 +---------------------------------------------+------------+-----------------+ | event_name | count_star | sum_timer_wait | +---------------------------------------------+------------+-----------------+ | wait/synch/mutex/sql/LOCK_open | 2784632 | 161835901661916 | | wait/synch/mutex/mysys/THR_LOCK::mutex | 2784556 | 28804019775192 | ...skip... Note that LOCK_open and THR_LOCK::mutex are contested equally, but wait time differs ~6x. Removing used_tables from tc_acquire_table/tc_release_table makes sum_timer_wait go down from 161s to 100s. Regards, Sergey _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

