Hi Sergei, I just realized that I didn't share benchmark results (read-only OLTP that XL did): 5.6 tps: ~18k 10.0 tps: ~9k 10.0 + MDEV-4956 tps: ~11k
I estimate tc_acquire_table and tc_release_table are eating up ~6k tps (2k per list). Regards, Sergey On Tue, Sep 10, 2013 at 09:11:16PM +0200, Sergei Golubchik wrote: > Hi, Sergey! > > On Sep 10, Sergey Vojtovich wrote: > > Hi Sergei, > > > > thanks for looking into this patch. Frankly speaking I find it a bit > > questionable too. Below are links that should answer your questions... > > What problem do I attempt to solve: > > https://lists.launchpad.net/maria-developers/msg06118.html > > How do I attempt to solve it: https://mariadb.atlassian.net/browse/MDEV-4956 > > Yes, I've seen and remember both, but they don't answer my question, > which was about specific changes that you've done, not about the goal. > But ok, see below. > > > For every statment we acquire table from table cache and then release table > > back > > to the cache. That involves update of 3 lists: unused_tables, per-share > > used_tables and free_tables. These lists are protected by LOCK_open > > (see tc_acquire_table() and tc_release_table()). > > Why per-share lists are updated under the global mutex? > > > Every time we update global pointer, corresponding cache lines of > > sibling CPUs have to be invalidated. This is causing expensive memory > > reads while LOCK_open is held. > > > > Oracle solved this problem by partitioning table cache, allowing > > emulation of something like per-CPU lists. > > > > We attempted to split LOCK_open logically, and succeeded at everything > > but these 3 lists. I attempted lock-free list for free_tables, but TPS > > rate didn't improve. > > How did you do the lock-free list, could you show, please? > > > What we need is to reduce number of these expensive memory reads, and > > there are two solutions: partition these lists or get rid of them. As > > we agreed not to partition, I'm trying the latter solution. > > Well, you can partition the list. With 32 list head pointers. And a > thread adding a table only to "this thread's" list. Of course, it's not > complete partitioning betwen CPUs, as any thread can remove a table from > any list. But at least there won't be one global list head pointer. > > > Why I find this patch questionable? It reduces LOCK_open wait time by > > 30%, to get close to Oracle wait time, we need to reduce wait time by > > 90%. We could remove unused_tables as well, but it will be 60% not 90%. > > Hmm, if you're only interested in optimizing this specific use case - > one table, many threads - then yes, may be. But if you have many tables, > then modifying per-share lists under the share own mutex is, basically, > a must. > > Regards, > Sergei > _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

