> From: Brandon Nesterenko <brandon.nestere...@mariadb.com> > > When using semi-sync replication with > rpl_semi_sync_master_wait_point=AFTER_COMMIT, the performance of the > primary can significantly reduce compared to AFTER_SYNC's > performance for workloads with many concurrent users executing > transactions. This is because all connections on the primary share
Hi Brandon, Here is my second round review of the MDEV-33551 patch. I think these comments of mine from first round still need to be addressed: > > A bit of care must be taken about lifetime of Tranx_node and THD > > respectively (in case one goes away but not the other). But the whole > > > +void Repl_semi_sync_master::await_all_slave_replies() > > > + wait_result= cond_timewait(front.thd, &abstime); > > > > I'm wondering, is it safe here to wait on front.thd->COND_wakeup_ready? > > Isn't there a possibility that this might go away during the wait? I tried a simple testcase (attached) which disconnects the session thread while await_all_slave_replies() is running during shutdown. As I expected, I could see the COND_wakeup_ready of the session thread being de-allocated while await_all_slave_replies() is still waking up from that condition. That cannot be correct. Similarly, it seems clearly possible that a session thread is killed and exits while its entry is still present in the m_active_tranxs list, which will similarly lead to signalling a de-allocated COND_wakeup_ready. When a thread aborts the wait before the entry is removed, I think it should simply look up its entry (can use the m_trx_htb hash table, or even just iterate the list) and set its THD to NULL (or remove the entry, whatever is simpler). For the await_all_slave_replies() problem, I think the whole wait-at-shutdown is much too complex as it is now. The whole is_thd_awaiting_semisync_ack(), n_threads_awaiting_ack, special-case not killing threads waiting for semi-sync-ack, and now introducing unmark_thd_awaiting_ack. And it still has the bug waiting on a COND_wakeup_ready of another THD that can disappear during the wait. I think all of that should just go away, the wait in await_all_slave_replies() should be done much simpler with a dedicated condition variable to wait for. Eg. await_all_slave_replies() just does a mysql_cond_wait(COND_binlog_send) under LOCK_binlog until the last binlog position is ack'ed, and this condition is then signalled from report_reply_binlog() (either always, or await_all_slave_replies() can set a flag to request signal). Or have the shutdown thread enqueue its THD at the end of the m_active_tranxs list. Let's avoid all this complex interaction between SHUTDOWN WAIT FOR ALL SLAVES and session threads. BTW, COND_binlog_send is currently unused in your patch. Below detailed comments on individual parts of the patch: > --- a/sql/mysqld.cc > +++ b/sql/mysqld.cc > @@ -1750,18 +1750,12 @@ static void close_connections(void) > - if (shutdown_wait_for_slaves && repl_semisync_master.get_master_enabled()) > + if (shutdown_wait_for_slaves && repl_semisync_master.get_master_enabled() > && > + repl_semisync_master.sync_get_master_wait_sessions()) Do you need to check sync_get_master_wait_sessions() > 0 here? Why not wait unconditionally for the last binlog position sent? > --- a/sql/semisync_master.cc > +++ b/sql/semisync_master.cc > @@ -68,6 +68,14 @@ static ulonglong timespec_to_usec(const struct timespec > *ts) > return (ulonglong) ts->tv_sec * TIME_MILLION + ts->tv_nsec / TIME_THOUSAND; > } > > +int signal_waiting_transaction(THD *waiting_thd, const char *binlog_file, > + my_off_t binlog_pos) > +{ > + if (waiting_thd->is_awaiting_semisync_ack) > + mysql_cond_signal(&waiting_thd->COND_wakeup_ready); The check on is_awaiting_semi_sync_ack seems wrong here. Normally, the THD will be waiting, otherwise it would not be in the list in the first place, so faster with an occasional redundant signal than a conditional always. And if the THD is not waiting, it's not safe to access it anyway. Instead, as mentioned above, have a check for thd != NULL here, and let a non-waiting thread clear the THD in its entry before it releases the LOCK_binlog. > -int Repl_semi_sync_master::cond_timewait(struct timespec *wait_time) > +int Repl_semi_sync_master::cond_timewait(THD *thd, struct timespec > *_wait_time) > + if (_wait_time == NULL) > + { > + create_timeout(&gen_wait_time, NULL); > + real_wait_time= &gen_wait_time; > + } > + else > + { > + real_wait_time= _wait_time; > + } Don't pass in booleans that are always hard-coded true or false in the caller. Instead create two different functions, one that takes a time_spec, and one that does not (the latter creating it and calling the former). > @@ -533,7 +559,8 @@ void Repl_semi_sync_master::remove_slave() > Signal transactions waiting in commit_trx() that they do not have to > wait anymore. > */ > - cond_broadcast(); > + m_active_tranxs->clear_active_tranx_nodes(NULL, NULL, > + signal_waiting_transaction); This does not compile on my machine (the second NULL is passed to my_off_t position), is that a last-minute change that was never tested? Or NULL is defined differently on your machine, so it doesn't fail to compile? BTW, question about the old code: Why didn't the old code clear the m_active_tranxs list? Was it a bug? Or was it cleared elsewhere before (and if so where)? > -void Repl_semi_sync_master::await_slave_reply() > +void Repl_semi_sync_master::await_all_slave_replies() > + The lock is taken per iteration rather than over the whole loop to allow > + more opportunity for the user threads to handle their ACKs. That sounds like a mis-understanding. The lock is always released by mysql_cond_timedwait() in each iteration of the loop, no need to release/take it again for just a couple instructions. > + while (TRUE) > + { > + lock(); > + if (m_active_tranxs->is_empty() || !get_master_enabled() || !is_on()) > + { > + unlock(); > + break; > + } > + wait_result= m_active_tranxs->for_front( > + [](THD *thd, const char *binlog_file, my_off_t binlog_pos) -> int { > + return (thd->is_awaiting_semisync_ack) > + ? repl_semisync_master.cond_timewait(thd, NULL) > + : 0; > + }); As I wrote at the top, it is not safe to wait on the COND_wakeup_ready of another THD, since that THD might go away during the wait. Instead, let's do a specific wait for the end position here, either on COND_binlog or by enqueing an extra THD at the end. (And btw, in your code, why wait for the _front_ of the list and not the _rear_?) > diff --git a/sql/semisync_master.h b/sql/semisync_master.h > index 99f46869354..1eb690b572b 100644 > --- a/sql/semisync_master.h > +++ b/sql/semisync_master.h > > + * The pre_delete_hook parameter is a function pointer that will be invoked > + * for each Active_tranx node, in order, from m_trx_front to log_file_name, > + * e.g. to signal their wakeup condition. > Repl_semi_sync_binlog::LOCK_binlog > + * is held while this is invoked. "from m_trx_front to log_file_name", I don't understand? (Typo s/log_file_name/m_trx_rear/ ?) > +/* > + Element in Repl_semi_sync_master::wait_queue to preserve the state of a > + transaction waiting for an ACK. > +*/ > +typedef struct _semisync_wait_trx { > + const char *binlog_name; > + my_off_t binlog_pos; > + THD *thd; > +} semisync_wait_trx_t; > + This is not longer used (should be removed). > * Input: (the transaction events' ending binlog position) > * trx_wait_binlog_name - (IN) ending position's file name > * trx_wait_binlog_pos - (IN) ending position's file offset > + * unmark_thd_awaiting_ack - (IN) Whether or not to unmark > + * thd->is_awaiting_semisync_ack after > receiving > + * an ACK from the replica. If using wait_point > + * AFTER_SYNC with binlog_group_commit, only > the > + * last transaction written to binlog in the > + * group should negate the boolean, because the > + * same thread (i.e. leader thread) will wait > for > + * all transaction ACKs. Negating it too early > + * would break SHUTDOWN WAIT FOR ALL SLAVES, as > + * that is the condition tested to bypass > killing > + * threads in phase 1. In all other situations, > + * the boolean should be negated immediately. That seems unnecessarily convoluted. I think the real problem here is that write_tranx_in_binlog() sets the is_awaiting_semi_sync_ack flag on the wrong THD, on the waiter_thd instead of the trx_thd. But once we fix the await_all_slave_replies(), do we even need the is_awaiting_semi_sync_ack flag any more? If so, why? If not, then this point becomes moot anyway. - Kristian.
--source include/have_innodb.inc --source include/have_debug.inc --source include/have_binlog_format_mixed.inc --source include/master-slave.inc --connection master CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; INSERT INTO t1 VALUES (1,0); --sync_slave_with_master source include/stop_slave.inc; SET @old_semi_enabled= @@GLOBAL.rpl_semi_sync_slave_enabled; SET GLOBAL rpl_semi_sync_slave_enabled = 1; --source include/start_slave.inc --connection master SET GLOBAL rpl_semi_sync_master_enabled= 1; SET GLOBAL rpl_semi_sync_master_timeout= 5000; let $status_var= Rpl_semi_sync_master_clients; let $status_var_value= 1; source include/wait_for_status_var.inc; show status like 'Rpl_semi_sync_master_status'; show status like 'Rpl_semi_sync_master_clients'; --connection slave SET @old_dbug= @@GLOBAL.debug_dbug; SET GLOBAL debug_dbug="+d,simulate_delay_semisync_slave_reply"; --connect(con1, localhost, root,,) --connect(con2, localhost, root,,) --connection con1 send INSERT INTO t1 VALUES (2,0); --sleep 0.3 --disconnect con1 --connection default --write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect wait EOF #let $status_var= Rpl_semi_sync_master_wait_sessions; #let $status_var_value= 1; #source include/wait_for_status_var.inc; --sleep 0.1 --connection con2 SHUTDOWN WAIT FOR ALL SLAVES; --disconnect con2 --connection default --source include/wait_until_disconnected.inc --connection master --source include/wait_until_disconnected.inc --connection server_1 --source include/wait_until_disconnected.inc --connection default --append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect restart EOF --enable_reconnect --source include/wait_until_connected_again.inc --connection master --enable_reconnect --source include/wait_until_connected_again.inc --connection server_1 --enable_reconnect --source include/wait_until_connected_again.inc --save_master_pos --connection slave SET GLOBAL debug_dbug= @old_dbug; SET GLOBAL rpl_semi_sync_slave_enabled= @old_semi_enabled; --source include/stop_slave.inc --source include/start_slave.inc --sync_with_master --connection default DROP TABLE t1; --source include/rpl_end.inc
_______________________________________________ developers mailing list -- developers@lists.mariadb.org To unsubscribe send an email to developers-le...@lists.mariadb.org