Hi Kristian! On Mon, May 2, 2016 at 2:10 PM, Kristian Nielsen <[email protected]> wrote:
> Nirbhay Choubey <[email protected]> writes: > > [Cc: maria-developers@, please always keep these discussions on the > mailing list] > > > In Galera cluster, the state transfer scripts perform FTWRL and > > copy data along with the last of all available binlog files to the > > joiner node. > > > > After MDEV-181, I understand that the binlog checkpoint can be > > in any of the binary log files (and not necessarily the last one). > > > > This seemingly has caused MDEV-9423, in which the joiner node > > complains of the missing binlog file. > > > > Now the question is : Is FTWRL not sufficient to ensure that the > > checkpoint is always the last binlog file? > > So if I understand correctly, the issue is related to having binlog files > available during XA crash recovery. When the binlog file is rotated, there > is a small window where both the latest and the previous binlog files are > needed for crash recovery. The binlog checkpoint is the earliest binlog > file > that is needed for crash recovery, and it can be seen from the binlog > checkpoint event. > > So the problem here is that a copy is made just after binlog rotation, and > Galera only copies the most recent, mostly-empty binlog file, leaving > insufficient information for XA recovery, right? > Correct. > > One option to solve this is to always copy the last two binlog files. While > it is theoretically possible to have the binlog checkpoint more than two > files back, I think it will not occur in practice. > Another option is to wait for the binlog checkpoint to reach the current > binlog file. You can see this done in the test suite: > > mysql-test/include/wait_for_binlog_checkpoint.inc > > The binlog checkpointing happens asynchroneously, I *think* it can complete > even while FTWRL is active, but I am not 100% sure though. > > The checkpoint happens after InnoDB has made its commits durable with > fsync() or similar - only after that is it safe to discard the old binlog > data and still have correct crash recovery. > While copying the last 2 binlog files would have solved this, I have worked out a solution where the donor node waits for binlog checkpoint event for last binlog file to get logged before proceeding with file transfer. http://lists.askmonty.org/pipermail/commits/2016-June/009483.html By the way, I initially tried reusing is_xidlist_idle_nolock()/COND_xid_list to implement the waiting mechanism. But since binlog checkpoint events are written asynchronously after xid_count falls to 0, that did not work. So later came up with the above patch. Best, Nirbhay > > - Kristian. >
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

