Krisitan, The attached patch doesn't work properly because the format description event it saves is different from what master has sent. It has different number_of_event_types and potentially it has garbage (or incomplete data) in post_header_len.
And regarding the fix to Gtid_log_event that we were talking about: I've tried to do that but I felt like I don't understand the code quite well at the moment to do that. So apparently all peek*() functions should have Format_description added as parameter and it's easy to pass one from queue_event(), but then e.g. mysql_binlog_send() will need description event too to pass it to send_event_to_slave() and is_until_reached() so that those could pass it to peek*(). For correctness this format description should be read from binlog, right? That would be pretty non-trivial to do. Also gtid_state_from_pos() needs Format_descirption, but it will apparently meet that at the beginning of the binlog file, so it can be created before it will be actually needed. Although constructor of Format_description_log_event needs another instance of Format_description. Should it be just created like Format_description_log_event(BINLOG_VERSION)? Pavel On Thu, Aug 29, 2013 at 6:24 AM, Kristian Nielsen <[email protected]> wrote: > Pavel Ivanov <[email protected]> writes: > >>> But this must be the same problem with normal replication? Whenever the >>> slave >>> decides to rotate the relay log, it will write a format description event >>> created by itself with no following format description created on the >>> master. So it seems this must work somehow, though I'll frankly admit I do >>> not >>> understand the details of how this works (do you know?) >> >> Yes, I investigated this. During normal replication (when relay log is >> rotated automatically due to max_size) slave's format description is >> written at the beginning of the new realy log file, but right after >> that there's code that if description_event_for_queue->binlog_version >>>= 4 then it writes description_event_for_queue into relay log too. >> Also it ensures that the event has created = 0 and artificial_event >> set to 1. So SQL thread still gets the master's format description and >> doesn't rollback the transaction. > >> When IO thread reconnects to master the first event it receives is >> Rotate. For Rotate event queue_event() executes process_io_rotate(). >> Inside there if >> mi->rli.relay_log.description_event_for_queue->binlog_version >= 4 it >> forcefully replaces description_event_for_queue with new event with >> binlog_version = 3. Then it does the actual relay log rotation during >> which description_event_for_queue is not written into the new log file >> (and it shouldn't as it's not master's at this point anyway). The next > > I see. > > So one possible solution is to do the same at the reconnect case as what we do > in relay-log rotate initiated by slave due to size: Write out the > description_event_for_queue to the relay log with created=0 and artificial=1. > > I have attached a patch for this, what do you think? > > Do you have the possibility to test if this works (eg. when we get a reconnect > when the master's description event is incompatible with the slave's)? > > [This code really is criminally ugly, even for replication standard. But I do > not really know how to fix it in any reasonable way :-(] > >> Maybe I messed up with testing, I'll try to retest again... > > FYI: I took unmodified 10.0-base, commented out those two lines, ran > rpl.rpl_gtid_reconnect, and it failed. > > Thanks, > > - Kristian. > _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

