Hi! On Fri, Nov 14, 2014 at 3:58 AM, Kristian Nielsen <[email protected]> wrote:
> Nirbhay Choubey <[email protected]> writes: > > >> > ##### Case 7: Stop slave into the middle of a transaction being > filtered > >> and > >> > # start it back with filtering disabled. > >> > > >> > --echo # On master > >> > connection master; > >> > SET @@session.gtid_domain_id=1; > >> > BEGIN; > >> > INSERT INTO t2 VALUES(3); > >> > INSERT INTO t3 VALUES(3); > >> > sync_slave_with_master; > >> > >> No, this does not work. Transactions are always binlogged as a whole on > the > >> master, during COMMIT. > >> > > > > You are right. My original intent was to test a transaction which > modifies > > both MyISAM and > > InnoDB tables, where first modification is done in MyISAM table. In which > > case the changes > > to MyISAM is sent to the slave right away, while rest of trx is sent on > > commit. I have modified > > the test accordingly. > > I'm still not sure you understand the scenario I had in mind. It's not > about > what happens on the master during the transaction. It is about what > happens in > case the slave disconnects in the middle of receiving an event > group/transaction. > You are perhaps looking at an older version of the test. The latest says : <cut> ##### Case 7: Stop slave before a transaction (involving MyISAM and InnoDB # table) being filtered commits and start it back with filtering # disabled. ... </cut> > In general in replication, the major part of the work is not implementing > the > functionality for the normal case - that is usually relatively easy. The > major > part is handling and testing all the special cases that can occur in > special > scenarios, especially various error cases. The replication code is really > complex in this respect, and the fact that things by their nature happen in > parallel between different threads and different servers make things even > more > complex. > > What I wanted you to think about here is what happens if the slave is > disconnected from the master after having received the first half of an > event > group. For example due to network error. This will not happen normally in a > mysql-test-case run, and if it happens in a production site for a user, it > will be extremely hard to track down. > > In this case, the second half of the event group could be received much > later > than the first half. The IO thread could have been stopped (or even the > whole > mysqld server could have been stopped) in-between, and the replication > could > have been re-configured with CHANGE MASTER. Since the IO thread is doing > the > filtering, it seems very important to consider what will happen if eg. > filters > are enabled while receiving the first half of the transaction, but disabled > while receiving the second half: > Suppose we have this transaction: > > BEGIN GTID 2-1-100 > INSERT INTO t1 VALUES (1); > INSERT INTO t1 VALUES (2); > COMMIT; > > What happens in the following scenario? > > CHANGE MASTER TO master_use_gtid=current_pos, ignore_domain_ids=(2); > START SLAVE; > # slave IO thread connects to master; > # slave receives: BEGIN GTID 2-1-100; INSERT INTO t1 VALUES (1); > # slave IO thread is disconnected from master > STOP SLAVE; > # slave mysqld process is stopped and restarted. > CHANGE MASTER TO master_use_gtid=no, ignore_domain_ids=(); > START SLAVE; > # slave IO thread connects to master; > # slave IO thread receives: INSERT INTO t1 VALUES (2); COMMIT; > > Are you sure that this will work correctly? And what does "work correctly" > mean in this case? Will the transaction be completely ignored? Or will it > be > completely replicated on the slave? The bug would be if the first half > would > be ignored, but the second half still written into the relay log. > > To test this, you would need to use DBUG error insertion. There are already > some tests that do this. They use for example > > SET GLOBAL debug_dbug="+d,binlog_force_reconnect_after_22_events"; > > The code will then (in debug builds) simulate a disconnect at some > particular > point in the replication stream, allowing this rare but important case to > be > tested. This is done using DBUG_EXECUTE_IF() in the code. > I had already added multiple cases under rpl_domain_id_filter_io_crash.test using DBUG_EXECUTE_IF("+d,"kill_io_slave_before_commit") in the previous commit. Even though, it is not exactly similar to what you suggest, it does, however,try to kill I/O thread when it receives COMMIT/XID event (cases 0 - 3) in order to test what happens when I/O exits before reading the complete transaction or group with filtering enable before/after slave restart. Following your suggestion, I have now added 2 more cases (4 and 5) using DBUG_EXECUTE_IF(+d,"kill_slave_io_after_2_events") to kill I/O after reading first INSERT in a transaction. The outcome is expected. > > To work on replication without introducing nasty bugs, it is important to > think through cases like this carefully, and to convince yourself that > things > will work correctly. Disconnects at various points, crashes on the master > or > slave, errors during applying events or writing to the relay logs, and so > on. > I agree. > > Hope this helps, > Indeed. Best, Nirbhay
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

