Hi Kristian, On Sat, Jun 25, 2016 at 3:57 AM, Kristian Nielsen <[email protected]> wrote:
> Nirbhay Choubey <[email protected]> writes: > > >> Also, it seems reasonable that FTWRL in general could wait for > checkpoint > >> events so that other backup mechanisms similarly could avoid binlog > files > > > That sound good to me. But, considering Percona's backup locks, it seems > > more logical to > > implement this in Backup locks instead, whenever they get > > ported/implemented in MariaDB. > > Right. As I was thinking about the problem, it occured to me that this > wasn't really a Galera-specific thing, my suggestion seemed a valid general > wait-for-checkpoint mechanism. > > So we should put the code that waits for checkpoint in its own function > (as you already did, MYSQL_BIN_LOG::wait_for_last_checkpoint_event()). But > I > agree, we can wait with actually exposing it (in FTWRL, backup locks, > whatever) until when/if that becomes relevant/priority. > > I would just note that this wait does not really do anything unless there > is > something else (like FTWRL in your case) that prevents new commits, > otherwise a new checkpoint could become pending at any time after > wait_for_last_checkpoint_event() returns. > > > Also, in this particular case, the problem lies > > in reload_acl_and_cache(REFRESH_BINARY_LOG), > > (executed after FTWRL while preparing for SST) that rotates the binary > log. > > Hm, I see. So you're always copying an empty binlog file? I'm wondering why > you don't simply don't copy any binlogs and just start the new server with > --tc-heuristic-recover=ROLLBACK ... maybe copying binlogs was just > considered easier? Anyway, I don't have the bigger picture, so can't have > much of an informed opinion here. > The joiner node also picks up the GTID state from the binary log file it received. > > Yes, it worked. But, to solve this issue in 10.1, I have added this wait > to > > REFRESH_BINARY_LOG > > (as explained above) only when the server is acting as a Galera node. > > That seems quite ugly, why not call it from the SST code, after it has > called reload_acl_and_cache()? You're basically making FLUSH LOGS behave > differently in Galera and non-Galera (if my understanding is correct), > which > might lead to subtle bugs? I initially thought of adding the call after reload_acl_and_cache(), but there could still be a case when user performs a REFRESH_BINARY_LOG before LOCK_log is acquired. > But again, I don't have the bigger picture, and > the whole wsrep patch is garbage all over the server anyway, so I suppose > it > doesn't matter much to me, as long as it's #ifdef WSREP. > > >> and it also makes the extra lock/unlock of LOCK_log above redundant. > > > > > > Not quite. The wait logic (that includes LOCK_log, as the snippet above) > is > > to pause > > REFRESH_BINARY_LOG and an additional use of LOCK_log to block the RESET/ > > FLUSH commands while file transfer is in progress. > > Sure, it's fine to have both, probably makes the code clearer anyway. > Right. > > > --- a/sql/log.cc > > +++ b/sql/log.cc > > @@ -3690,7 +3690,10 @@ bool MYSQL_BIN_LOG::open(const char *log_name, > > new_xid_list_entry->binlog_id= current_binlog_id; > > /* Remove any initial entries with no pending XIDs. */ > > while ((b= binlog_xid_count_list.head()) && b->xid_count == 0) > > + { > > my_free(binlog_xid_count_list.get()); > > + mysql_cond_broadcast(&COND_xid_list); > > + } > > binlog_xid_count_list.push_back(new_xid_list_entry); > > mysql_mutex_unlock(&LOCK_xid_list); > > There is no need to mysql_cond_broadcast() multiple times. Use just a > single > broadcast outside the loop (before or after, doesn't make a difference). > Fixed. Best, Nirbhay > > - Kristian. >
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

