Hello Kristian, I suspect that the poor slave replication performance for optimistic replication occurs because TokuDB does not implement the kill_query handlerton function. kill_handlerton gets called to resolve lock wait for situations that occur when parallel replicating a small sysbench table. InnoDB implements kill_query while TokuDB does not implement it.
On Fri, Aug 12, 2016 at 12:47 PM, Rich Prohaska <[email protected]> wrote: > Hello Kristian, > I am running your opt2 branch with a small sysbench oltp test (1 table, > 1000 rows, 8 threads). the good news is that the slave stalls due to lock > timeouts are gone. the bad news is that the slave performance is suspect. > > when slave in conservative mode with 2 threads, the tokudb wait for > callback is being called (i put in a "printf"), which implies a parallel > lock conflict. I assumed that conservative mode implies parallel execution > of transactions that were group committed together, which I assumed would > imply that these transactions were conflict free. Obviously not the case. > > when slave in optimistic mode with 8 threads, i see very high slave query > execution times in processlist. > > | Id | User | Host | db | Command | Time | State > | Info | Progress | > +----+-------------+-----------+------+---------+------+---- > -------------------------------------------+------------------+----------+ > | 6 | root | localhost | NULL | Query | 0 | init > | show processlist | 0.000 | > | 16 | system user | | NULL | Connect | 383 | Waiting for > master to send event | NULL | 0.000 | > | 17 | system user | | NULL | Connect | 7 | Waiting for prior > transaction to commit | NULL | 0.000 | > | 18 | system user | | NULL | Connect | 3 | Waiting for prior > transaction to commit | NULL | 0.000 | > | 19 | system user | | NULL | Connect | 3 | Waiting for prior > transaction to commit | NULL | 0.000 | > | 20 | system user | | NULL | Connect | 3 | > Delete_rows_log_event::find_row(-1) | NULL | > 0.000 | > | 21 | system user | | NULL | Connect | 3 | Waiting for prior > transaction to commit | NULL | 0.000 | > | 22 | system user | | NULL | Connect | 3 | Waiting for prior > transaction to commit | NULL | 0.000 | > | 23 | system user | | NULL | Connect | 7 | Waiting for prior > transaction to commit | NULL | 0.000 | > | 24 | system user | | NULL | Connect | 3 | Waiting for prior > transaction to commit | NULL | 0.000 | > | 25 | system user | | NULL | Connect | 382 | Waiting for room > in worker thread event queue | NULL | 0.000 | > > It appears that there is some MULTIPLE SECOND STALL somewhere. gdb shows > that the threads are either > (1) waiting in the tokudb lock manager, or > (2) waiting in the wait_for_commit::wait_for_prior_commit2 function. > > > > > > > On Fri, Aug 12, 2016 at 8:50 AM, Kristian Nielsen < > [email protected]> wrote: > >> [Moving the discussion to maria-developers@, hope that is ok/makes >> sense...] >> >> Ok, so here is a proof-of-concept patch for this, which seems to make >> TokuDB >> work with optimistic parallel replication. >> >> The core of the patch is this line in lock_request.cc >> >> lock_wait_callback(callback_data, m_txnid, conflicts.get(i)); >> >> which ends up doing this: >> >> thd_report_wait_for (requesting_thd, blocking_thd); >> >> All the rest of the patch is just getting the right information around >> between the different parts of the code. >> >> I put this on top of Jocelyn Fournier's tokudb_rpl.rpl_parallel_optimi >> stic >> patches, and pushed it on my github: >> >> https://github.com/knielsen/server/tree/toku_opr2 >> >> With this patch, the test case passes! So that's promising. >> >> Some things still left to do for this to be a good patch: >> >> - I think the callback needs to trigger also for an already waiting >> transaction, in case another transaction arrives later to contend for >> the >> same lock, but happens to get the lock earlier. I can look into this. >> >> - This patch needs linear time (in number of active transactions) per >> callback to find the THD from the TXNID, maybe that could be optimised. >> >> - Probably the new callback etc. needs some cleanup to better match >> TokuDB >> code organisation and style. >> >> - And testing, of course. I'll definitely need some help there, as I'm >> not >> familiar with how to run TokuDB efficiently. >> >> Any thoughts or comments? >> >> - Kristian. >> >> >
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

