Hi Sergei, below are results of "operf -e LLC_MISSES:10000:0x41 --pid `pidof mysqld`". Tested with performance schema and general log off. tc_acquire_table() seem to be inlined, so it appears as tdc_acquire_share().
Also note that my_malloc_size_cb_func() looks like another bottleneck. Misses summary for tc_acquire_table()/tc_release_table() -------------------------------------------------------- 4.5408 + 3.0090= 7.5498% (10.0) 2.9916 + 2.6908= 5.6824% (10.0 + MDEV4956) 2.3502 + 1.7159= 4.0661% (10.0 + MDEV4956 + no unused_tables) 10.0 (rev.3912) --------------- CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 40292 34.8688 no-vmlinux /no-vmlinux 21638 18.7256 libpthread-2.15.so pthread_mutex_lock 6718 5.8138 libpthread-2.15.so pthread_mutex_unlock 5247 4.5408 mysqld tc_release_table(TABLE*) 3477 3.0090 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 3410 2.9510 mysqld TABLE::init(THD*, TABLE_LIST*) 3316 2.8697 mysqld my_malloc_size_cb_func 3126 2.7053 mysqld open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) 2360 2.0424 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 2254 1.9506 libpthread-2.15.so pthread_rwlock_unlock 2152 1.8623 libpthread-2.15.so __lll_lock_wait 1867 1.6157 mysqld heap_info 1525 1.3197 mysqld tdc_refresh_version() 1441 1.2470 mysqld read_lock_type_for_table(THD*, Query_tables_list*, TABLE_LIST*) 1374 1.1891 mysqld make_join_statistics(JOIN*, List<TABLE_LIST>&, Item*, st_dynamic_array*) 1218 1.0541 mysqld get_lock_data(THD*, TABLE**, unsigned int, unsigned int) 1212 1.0489 mysqld THD::enter_stage(PSI_stage_info_v1 const*, PSI_stage_info_v1*, char const*, char const*, unsigned int) 1062 0.9191 mysqld _my_thread_var 945 0.8178 libc-2.15.so __memset_sse2 875 0.7572 mysqld thr_unlock 724 0.6266 mysqld lock_tables(THD*, TABLE_LIST*, unsigned int, unsigned int) 673 0.5824 mysqld thr_multi_lock 10.0 + MDEV4956 --------------- CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 43070 35.7909 no-vmlinux /no-vmlinux 22661 18.8311 libpthread-2.15.so pthread_mutex_lock 7960 6.6147 libpthread-2.15.so pthread_mutex_unlock 3600 2.9916 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 3498 2.9068 mysqld my_malloc_size_cb_func 3238 2.6908 mysqld tc_release_table(TABLE*) 3189 2.6500 mysqld TABLE::init(THD*, TABLE_LIST*) 2611 2.1697 libpthread-2.15.so __lll_lock_wait 2414 2.0060 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 2360 1.9611 mysqld open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) 2250 1.8697 mysqld heap_info 1889 1.5697 mysqld read_lock_type_for_table(THD*, Query_tables_list*, TABLE_LIST*) 1557 1.2939 mysqld make_join_statistics(JOIN*, List<TABLE_LIST>&, Item*, st_dynamic_array*) 1378 1.1451 mysqld THD::enter_stage(PSI_stage_info_v1 const*, PSI_stage_info_v1*, char const*, char const*, unsigned int) 1319 1.0961 mysqld get_lock_data(THD*, TABLE**, unsigned int, unsigned int) 1293 1.0745 mysqld thr_multi_lock 1267 1.0529 libpthread-2.15.so pthread_rwlock_unlock 1211 1.0063 mysqld _my_thread_var 1136 0.9440 libc-2.15.so __memset_sse2 887 0.7371 mysqld thr_unlock 736 0.6116 mysqld free_root 698 0.5800 mysqld tdc_refresh_version() 10.0 + MDEV4956 + no unused_tables ---------------------------------- CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 41828 35.5653 no-vmlinux /no-vmlinux 22377 19.0266 libpthread-2.15.so pthread_mutex_lock 8219 6.9884 libpthread-2.15.so pthread_mutex_unlock 3578 3.0423 mysqld my_malloc_size_cb_func 3244 2.7583 mysqld TABLE::init(THD*, TABLE_LIST*) 2764 2.3502 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 2630 2.2362 libpthread-2.15.so __lll_lock_wait 2570 2.1852 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 2407 2.0466 mysqld open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) 2233 1.8987 mysqld heap_info 2027 1.7235 mysqld read_lock_type_for_table(THD*, Query_tables_list*, TABLE_LIST*) 2018 1.7159 mysqld tc_release_table(TABLE*) 1537 1.3069 mysqld make_join_statistics(JOIN*, List<TABLE_LIST>&, Item*, st_dynamic_array*) 1404 1.1938 mysqld THD::enter_stage(PSI_stage_info_v1 const*, PSI_stage_info_v1*, char const*, char const*, unsigned int) 1368 1.1632 mysqld _my_thread_var 1343 1.1419 mysqld thr_multi_lock 1284 1.0918 mysqld get_lock_data(THD*, TABLE**, unsigned int, unsigned int) 1213 1.0314 libc-2.15.so __memset_sse2 1162 0.9880 libpthread-2.15.so pthread_rwlock_unlock 799 0.6794 mysqld thr_unlock 753 0.6403 mysqld tdc_refresh_version() 717 0.6096 mysqld THD::set_open_tables(TABLE*) Regards, Sergey On Mon, Sep 16, 2013 at 08:50:49PM +0400, Sergey Vojtovich wrote: > Hi Sergei, > > just found another interesting test result. I added dummy LOCK_table_share > mutex > lock and unlock to tc_acquire_table() and tc_release_table() (before locking > LOCK_open), just to measure pure mutex wait time. > > Test execution time: 45s > LOCK_open wait time: 34s > LOCK_table_share wait time: 0.8s > > +--------------------------------------------------------+------------+----------------+ > | event_name | count_star | > sum_timer_wait | > +--------------------------------------------------------+------------+----------------+ > | wait/synch/mutex/sql/LOCK_open | 585690 | > 34298972259258 | > | wait/synch/mutex/mysys/THR_LOCK::mutex | 585604 | > 4560420039042 | > | wait/synch/mutex/sql/TABLE_SHARE::tdc.LOCK_table_share | 585710 | > 794564626359 | > | wait/synch/rwlock/sql/LOCK_tdc | 290940 | > 237751940139 | > | wait/synch/mutex/sql/THD::LOCK_thd_data | 1838668 | > 219829105251 | > | wait/synch/rwlock/innodb/hash table locks | 683395 | > 159792339294 | > | wait/synch/rwlock/innodb/btr_search_latch | 290892 | > 138915354207 | > | wait/synch/mutex/innodb/trx_sys_mutex | 62940 | > 78334973451 | > | wait/synch/rwlock/innodb/index_tree_rw_lock | 167822 | > 49323455349 | > | wait/synch/rwlock/sql/MDL_lock::rwlock | 41970 | > 31436713938 | > +--------------------------------------------------------+------------+----------------+ > > Regards, > Sergey > > On Mon, Sep 16, 2013 at 04:46:41PM +0400, Sergey Vojtovich wrote: > > Hi Sergei, > > > > On Sat, Sep 14, 2013 at 04:44:28PM +0200, Sergei Golubchik wrote: > > > Hi, Sergey! > > > > > > On Sep 13, Sergey Vojtovich wrote: > > > > Hi Sergei, > > > > > > > > comments inline and a question: 10.0 throughput is twice lower than 5.6 > > > > in a specific case. It is known to be caused by tc_acquire_table() and > > > > tc_release_table(). Do we want to fix it? If yes - how? > > > > > > How is it caused by tc_acquire_table/tc_release_table? > > Threads spend a lot of time waiting for LOCK_open in these functions. > > Because protected by LOCK_open code takes a lot of time to execute. > > > > > In what specific case? > > The case is: many threads access one table (read-only OLTP). > > > > > > > > > > > > Why per-share lists are updated under the global mutex? > > > > > > Alas, it doesn't solve CPU cache coherence problem. > > > > > It doesn't solve CPU cache coherence problem, yes. > > > > > And it doesn't help if you have only one hot table. > > > > > But it certainly helps if many threads access many tables. > > > > Ok, let's agree to agree: it will help in certain cases. Most probably > > > > it > > > > won't improve situation much if all threads access single table. > > > > > > Of course. > > > > > > > We could try to ensure that per-share mutex is on the same cache line as > > > > free_tables and used_tables list heads. In this case I guess > > > > mysql_mutex_lock(&share->tdc.LOCK_table_share) will load list heads into > > > > CPU cache along with mutex structure. OTOH we still have to read > > > > per-TABLE > > > > prev/next pointers. And in 5.6 per-partition mutex should less > > > > frequently > > > > jump out of CPU cache than our per-share mutex. Worth trying? > > > > > > Did you benchmark that these cache misses are a problem? > > > What is the main problem that impacts the performance? > > We (Axel and me) did a lot of different benchmarks before we concluded > > cache misses to be the main problem. Please let me known if you're > > interested > > in specific results - we either find them in benchmark archives or benchmark > > again. > > > > One of interesting results I just found is as following... > > 10.0.4, read-only OLTP, 64 threads, tps ~10000 > > +---------------------------------------------+------------+-----------------+ > > | event_name | count_star | sum_timer_wait > > | > > +---------------------------------------------+------------+-----------------+ > > | wait/synch/mutex/sql/LOCK_open | 2784632 | > > 161835901661916 | > > | wait/synch/mutex/mysys/THR_LOCK::mutex | 2784556 | > > 28804019775192 | > > ...skip... > > > > Note that LOCK_open and THR_LOCK::mutex are contested equally, but wait time > > differs ~6x. > > > > Removing used_tables from tc_acquire_table/tc_release_table makes > > sum_timer_wait > > go down from 161s to 100s. > > > > Regards, > > Sergey > > > > _______________________________________________ > > Mailing list: https://launchpad.net/~maria-developers > > Post to : [email protected] > > Unsubscribe : https://launchpad.net/~maria-developers > > More help : https://help.launchpad.net/ListHelp > > _______________________________________________ > Mailing list: https://launchpad.net/~maria-developers > Post to : [email protected] > Unsubscribe : https://launchpad.net/~maria-developers > More help : https://help.launchpad.net/ListHelp _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

