Hi! >>>>> "Kristian" == Kristian Nielsen <kniel...@knielsen-hq.org> writes:
Kristian> Michael Widenius <mo...@askmonty.org> writes: >> What should happen if you kill a replication thread is that >> replication should stop for that master. <cut> Kristian> It seems to make sense that KILL CONNECTION on the SQL thread would stop the Kristian> replication. I do not know whether this is how it works or not in current Kristian> replication, but whatever it is, we should just leave the behaviour the same. This is how things are now (as far as I know). <cut> Kristian> But KILL of a worker thread should _not_ cause the thread to exit, I Kristian> think. The pool of parallel threads is just a thread pool, and for simplicity Kristian> in the first version, it has a fixed size. A KILL will disconnect the thread Kristian> from the replication connection it was servicing, but the thread must remain Kristian> alive, ready to serve another connection if needed. Correct. The KILL command only kills the query or the connection (ie, THD). The thread is always reused. Kristian> So the rpt->stop is only about maintaining the thread pool for parallel Kristian> replication, not about anything to do with aborting any specific master Kristian> connection replication. In rpl_parallel_change_thread_count() we set rpt->stop Kristian> to ask all existing threads to exit, and afterwards spawn a new set of Kristian> threads. This can only happen when all replication SQL/IO threads are stopped Kristian> and no event execution is taking place. Yes. The questions is when rpt->stop should take effect. My suggestion is that the replication threads should only examine rpt->stop after commits. This way we can ensure that when all threads are stopped, everything is committed to a certain point and we never have to do a rollback. Kristian> So this code of mine is actually wrong, I think: Kristian> while (!rpt->stop && !thd->killed) Kristian> It should just be while (!rpt->stop) { ... }. Kristian> And thd->killed should not be used at all to control thread Kristian> termination. Instead, it should be used (and is not currently, needs to be Kristian> fixed) in various places for event execution to allow aborting currently Kristian> executing event. This is needed around rpt_handle_event() I think, and also Kristian> around queue_for_group_commit(). Yes. I will be working on this today & tomorrow. Note that every loop where we wait needs to be stoppable, either with 'stop' or with KILL. Kristian> For normal stop, this is handled mostly in the SQL thread. It waits with a Kristian> timeout for the current event group to finish normally, then does a hard kill Kristian> if needed. This needs to be extended to handle running things in a worker Kristian> thread, of course. The code is around sql_slave_killed(), I think. Kristian> Does that make sense? Makes sence. I will know more tomorrow when I have dug into the code a bit more. >>>> + mysql_mutex_lock(&LOCK_wait_commit); >>>> + waiting_for_commit= false; >>>> + mysql_cond_signal(&COND_wait_commit); >>>> + mysql_mutex_unlock(&LOCK_wait_commit); >>>> +} >>>> >>>> In this particular case it looks safe to move the cond signal out of >>>> the mutex. I don't see how anyone could miss the signal. >> Kristian> Probably. Generally I prefer to write it as above, to not have to make sure Kristian> that moving it out is safe. >> >> I agree that this is the way to do in general. >> However for cases where we are waiting almost once per query, we need >> to make things faster as this extra wakup will take notable resources! Kristian> Ok, no problem, I've changed it. Kristian> I should benchmark this, it keeps coming up. It's not clear to me which way Kristian> would be the fastest, seems it would depend on the implementation. I did a benchmark of this a long time ago (+10 years) on Linux and then using the signal after unlock was notable faster. The gain is two thread switches + one thread wakeup per cond_signal. >> Hi! >> >> Part 2 of review of parallel replication >> >> === added file 'sql/rpl_parallel.cc' >> >> <cut> >> >> +pthread_handler_t >> +handle_rpl_parallel_thread(void *arg) >> +{ >> >> <cut> >> >> + if (end_of_group) >> + { >> + in_event_group= false; >> + >> >> Add comment: >> >> /* >> All events for this group has now been executed and logged (but not >> necessaerly synced). >> Inform the other event groups that are waiting for this thread that >> we are done. >> */ >> >> + rgi->commit_orderer.unregister_wait_for_prior_commit(); >> + thd->wait_for_commit_ptr= NULL; >> + >> + /* >> + Record that we have finished, so other event groups will no >> + longer attempt to wait for us to commit. >> + >> + We can race here with the next transactions, but that is fine, as >> + long as we check that we do not decrease last_committed_sub_id. If >> + this commit is done, then any prior commits will also have been >> + done and also no longer need waiting for. >> + */ >> + mysql_mutex_lock(&entry->LOCK_parallel_entry); >> + if (entry->last_committed_sub_id < event_gtid_sub_id) >> + { >> + entry->last_committed_sub_id= event_gtid_sub_id; >> + mysql_cond_broadcast(&entry->COND_parallel_entry); >> + } >> + mysql_mutex_unlock(&entry->LOCK_parallel_entry); >> + >> >> Add: >> >> /* >> Tell storage engines that they can stop waiting. >> See comments in sql_class.cc::wakeup_subsequent_commits2() for why we can't >> do this part of the wakeup in unregister_wait_for_prior_commit(). >> */ Kristian> I think there is some misunderstanding here. I added some comments that will Kristian> hopefully clarify, but the situation is a bit different than how I read this Kristian> suggestion. I may have missunderstood some of the code. Then it's even more important to get a good comment! Kristian> The situation is that we have for example three threads, A, B, and C. We are Kristian> thread B at this point in the code. Thread B needed to wait for A to commit Kristian> first. C needs to wait for B to commit. Kristian> The call to unregister_wait_for_prior_commit() relates to B waiting for A, it Kristian> cannot be used in relation to C waiting for B, irrespectively of comments in Kristian> the function wakeup_subsequent_commits2() ... ok. I thought that when we called unregister_wait_for_prior_commit() we had already handled the event including logging. (Apparently rpt_handle_event() does less than what I expected). Kristian> The unregister_wait_for_prior_commit() removes B from the list of threads Kristian> waiting for A. This is usually redundant, because B already removed itself Kristian> with wait_for_prior_commit() during the commit step. However, if there is for Kristian> example an error we might not reach commit, and thus we need to remove the Kristian> dangling entry in A. ok, this was a bit unclear. I thought that B could not stop waiting for A until A had written everything to the binary log. I don't see how we B can stop waiting for A until A at least has done all commands and written all it's logs. Kristian> The wakeup_subsequent_commits() signals C that B is done. Again, this is often Kristian> redundant, because we already signalled C during B's commit step. However, C Kristian> could have registered to wait for B between B's commit and this place in the Kristian> code, so it is needed for correctness. This part is clear. <cut> >> /* >> The following loops wait until each thread is done it's work for each >> event group and then stops it when the thread has nothing to do. >> >> Note that this may take some time as rpl_paralell:do_event() may request >> threads at the same time. >> */ Kristian> I added this comment instead: Kristian> /* Kristian> Grab each old thread in turn, and signal it to stop. Kristian> Note that since we require all replication threads to be stopped before Kristian> changing the parallel replication worker thread pool, all the threads will Kristian> be already idle and will terminate immediately. Kristian> */ Kristian> Hope that clarifies things. Yes, it does. Thanks. >> +void >> +rpl_parallel::wait_for_done() >> +{ >> + struct rpl_parallel_entry *e; >> + uint32 i; >> + >> + for (i= 0; i < domain_hash.records; ++i) >> + { >> + e= (struct rpl_parallel_entry *)my_hash_element(&domain_hash, i); >> + mysql_mutex_lock(&e->LOCK_parallel_entry); >> + while (e->current_sub_id > e->last_committed_sub_id) >> + mysql_cond_wait(&e->COND_parallel_entry, &e->LOCK_parallel_entry); >> + mysql_mutex_unlock(&e->LOCK_parallel_entry); >> + } >> +} >> >> How to handle errors in the above case? >> For example when we get an error and we had to abort things? Kristian> I think even if we have to abort event execution in a worker thread, we should Kristian> still increment e->last_committed_sub_id. So the wait here is ok. But we do Kristian> need code to be able to force all worker threads to stop. Kristian> An important piece of this puzzle is in sql_slave_killed(). I do not Kristian> understand the full details of this code yet, but I believe the idea is to Kristian> wait with some timeout for event groups to finish normally, then hard kill if Kristian> the timeout triggers (I vaguely remember that for InnoDB we can just kill and Kristian> rollback, while for MyISAM the timeout is necessary to avoid leaving half an Kristian> update executed). We could extend this to check if we are using transactional tables or not and wait longer if not transaction tables has been used for the query. Kristian> We need to extend sql_slave_killed() to also handle waiting for each worker Kristian> thread to stop, with the timeout. Once we have that, it is hopefully clear how Kristian> to do wait_for_done(); maybe it can call sql_slave_killed(), or maybe it is no Kristian> longer needed (wait_for_done() is just a temporary thing I put in to get Kristian> things working for the benchmark). Kristian> We also need to extend the API around struct wait_for_commit to include error Kristian> handling. I suppose wait_for_prior_commit() should check for thd->killed when Kristian> it waits, and be extended to return an error if killed. And then also Kristian> wakeup_subsequent_commits() should be extended to take an error argument, so Kristian> that if B is waiting for A, and A gets an error, then A can call Kristian> wakeup_subsequent_commits() with error=true, which will cause B to wake up but Kristian> also get an error. Yes, that sounds right. Kristian> Then if we have a number of event groups processing in parallel, A-> B->C->D. And B ends up with an error. Then A can complete, but B will Kristian> signal an error to C (and then to D), so both B, C, and D will fail. One problem we have is that if B, C ,D are using non transactional tables, it would be better to have them complete than abort. This issue we can leave for now. Kristian> Then somewhere, perhaps around the end of apply_event_and_update_pos(), we Kristian> must check an error flag set for B, C, and D, and signal the error back to the Kristian> main SQL thread so that it can halt and print the error to the error log, as Kristian> usual. Kristian> Something like that, I think. Some work to do, but should be possible at least. Yes. Not too hard, I think. >> The other way would be to, on error, set last_committed_sub_id back to >> it's last value. >> What do you suggest? >> Assuming setting the value back would be the easiest solution.. Kristian> Yes (if I understood correctly what you meant with "setting back")... Kristian> I think we should update things the same way whether the transaction commits Kristian> successfully or fails, so we get the same wakeup of later transactions, but Kristian> with the extra error flag added to wait_for_commit so that later transactions Kristian> can see the failure and thus fail and rollback themselves. And then the error Kristian> can be propagated up the call stack, as normal. Kristian> Of course, this becomes more complex due to the need to be able to re-try a Kristian> transaction in case of certain temporary errors like deadlocks and so. Kristian> Hopefully something can be made to work. As we should never get a deadlock or temporary error on a slave, this should not be too hard for a first version. >> + /* >> + We are already executing something else in this domain. But the two >> + event groups were committed together in the same group commit on the >> + master, so we can still do them in parallel here on the slave. >> + >> + However, the commit of this event must wait for the commit of the >> prior >> + event, to preserve binlog commit order and visibility across all >> + servers in the replication hierarchy. >> + */ >> + rpl_parallel_thread *rpt= global_rpl_thread_pool.get_thread(e); >> >> Here we could add a test if get_thread() is 0, which could happen on shutdown >> or when killing a replication thread. Kristian> I do not think we need to do this. I made it so that the worker thread pool Kristian> can never change size without stopping all replication threads first, so we Kristian> avoid a lot of these issues. And as I wrote above, killing a worker thread Kristian> should terminate the event group that it is currently servicing (and the Kristian> associated SQL thread and master connection), but not remove the thread from Kristian> the pool. I was thinking about the case where all worker threads are occupied for a long time. In this case we will get do_event() stop a long time until a new thread is free. I do see some small advantages in beeing able to free the sql-thread early. If get_thread() is not a stop point, then we have the problem at terminate, that do_event() will wait for get_thread() and then schedule a new query() even if the server should be stopping. We need to check for the sql-thread being killed before it tells a worker thread to start with new queries... Kristian> (I can imagine that in a later version we want to implement a smarter worker Kristian> thread manager, but for the first version I tried to keep things simple, there Kristian> are plenty of other problems that we need to tackle...) >> I think it's relatively safe to use get_thread() as a end-synchronize >> point as we should never have anything half-done when we call >> get_thread()). As get_thread() can take a very long time and is not >> called in many places, it's also sounds like a resonable thing to do. Kristian> Sorry, I do not understand. "End-synchronize point" for what? Sorry, wrong word. What I meant was a point where it's safe to detect that the server or replication thread has been killed. >> + /* >> + Events like ROTATE and FORMAT_DESCRIPTION. Do not run in worker >> thread. >> + Same for events not preceeded by GTID (we should not see those >> normally, >> + but they might be from an old master). >> + */ >> + qev->rgi= serial_rgi; >> + rpt_handle_event(qev, NULL); >> + delete_or_keep_event_post_apply(rli, typ, qev->ev); >> >> Why is delete_or_keep_event_post_apply() not part of rpt_handle_event()? >> You always call these two together... Kristian> It is to share code with the non-parallel case. Kristian> When parallel replication is disabled, everything is done in the SQL thread in Kristian> slave.cc. Like this: Kristian> exec_res= apply_event_and_update_pos(ev, thd, serial_rgi, NULL); Kristian> delete_or_keep_event_post_apply(serial_rgi, typ, ev); Kristian> Or did you mean that I could move the call of Kristian> delete_or_keep_event_post_apply() into the function rpt_handle_event()? Kristian> That might be possible, though it was not 100% clear to me as there are a few Kristian> differences in how it is done in the different cases. Yes, I meant moving the function into rpt_handle_event(). Kristian> But in any case that part of the code needs more work. Especially with regard Kristian> to error handling, and related to updating the last executed event Kristian> position. Probably it makes sense to revisit this as part of that work. ok. Regards, Monty _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp