Some additional information here, just my 2 cents. The system variables slave_net_timeout will controls over the time the slave server will be waiting for new data dumped form master's binlog. If no interaction or inactivity is greater than slave_net_timeout, the reconnection will be made. Here is a good point to observe: the connection between slave and master is just a connection made using the configured user one has created to be the replication user with certain privileges - just replication slave most of the time. In this case, if the slave should die, the connection made by that user will die as well. Both SHOW SLAVE HOSTS and SHOW SLAVE STATUS can report an OK for the connection between both sides, what does not have nothing with replication state being OK or not - there is another thread and you can check that as well.
e.g. #: SHOW SLAVE HOSTS/SHOW SLAVE STATUS # Slave_IO_Running: Yes Slave_SQL_Running: No When the slave server starts the connection with the master (START SLAVE), there is a kind of handshake which starts the replication structures, I mean, it's going to start the Binlog Dump Thread on the master side and two other threads on the slave side, the Slave I/O and SQL threads. If no interaction after slave_net_timeout seconds, the slave will disconnect and connect again and the thread running on the master is going to do that as well. Just checking in: using two servers in replication, idle servers, on the slave side I configured globally the slave_net_timeout=1 and log_warnings=2, as I'm using 5.6 for these tests. The interest here is to check the reconnection made by the slave and with that, the restart of Binlog Dump Thread on the master. Looking at the MySQL Error log... #: slave error log - reported every 5 secs 2015-06-25 11:38:21 2598 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-06-25 11:38:26 2598 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-06-25 11:38:31 2598 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. Here we can see clearly that the slave is reconnecting every sec, since those messages appearing on the error log are showing us the same behavior that happens when one issue a START SLAVE (with no user and password and even SSL) using e.g. the mysql client. #: master side error log - reported every 3 secs 2015-06-25 11:41:05 2648 [Note] Start binlog_dump to master_thread_id(120) slave_server(256380), pos(mysql-bin.000002, 120) 2015-06-25 11:41:08 2648 [Note] Start binlog_dump to master_thread_id(121) slave_server(256380), pos(mysql-bin.000002, 120) 2015-06-25 11:41:11 2648 [Note] Start binlog_dump to master_thread_id(122) slave_server(256380), pos(mysql-bin.000002, 120) We can see that the master Binlog Dump Thread is re-initialized as well when the Slave I/O Thread reconnects. BTW, SHOW PROCESSLIST delays at least 10 seconds to report that a slave has died when we can see a new connection is made observing the increment of the thread id. *************************** 2. row *************************** Id: 154 User: wb Host: 192.168.0.101:59416 db: NULL Command: Binlog Dump Time: 11 State: Master has sent all binlog to slave; waiting for binlog to be updated Info: NULL 2 rows in set (0.00 sec) *************************** 2. row *************************** Id: 155 User: wb Host: 192.168.0.101:59417 db: NULL Command: Binlog Dump Time: 2 State: Master has sent all binlog to slave; waiting for binlog to be updated Info: NULL 2 rows in set (0.00 sec) SHOW SLAVE HOSTS will remain the same in this case. mysql> show slave hosts; +-----------+------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+------+------+-----------+--------------------------------------+ | 256380 | s1 | 3306 | 1624536 | 11df2405-0ee5-11e5-aea6-0800274fb806 | +-----------+------+------+-----------+--------------------------------------+ 1 row in set (0.01 sec) mysql> show slave hosts; +-----------+------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+------+------+-----------+--------------------------------------+ | 256380 | s1 | 3306 | 1624536 | 11df2405-0ee5-11e5-aea6-0800274fb806 | +-----------+------+------+-----------+--------------------------------------+ 1 row in set (0.00 sec) mysql> show slave hosts; +-----------+------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+------+------+-----------+--------------------------------------+ | 256380 | s1 | 3306 | 1624536 | 11df2405-0ee5-11e5-aea6-0800274fb806 | +-----------+------+------+-----------+--------------------------------------+ 1 row in set (0.00 sec) Even having the SLAVE reconnecting on every second, the slave error log reports that reconnection every 5 secs, the SHOW PROCESSLIST reports a new thread id every 10 secs, the master report the start of Binlog Dump Thread on every 3 secs. >From here, we need to investigate more... -- *Wagner Bianchi, +55.31.8654.9510* Oracle ACE Director <https://apex.oracle.com/pls/otn/f?p=19297:4:105567988301604::NO:4:P4_ID:4541>, MySQL Certified Professional Percona MySQL Forum <http://www.percona.com/forums/> Community V.I.P. Email: m...@wagnerbianchi.com Skype: wbianchijr 2015-06-25 2:48 GMT-03:00 Ben RUBSON <ben.rub...@gmail.com>: > 2015-06-22 13:45 GMT+02:00 Ben RUBSON <ben.rub...@gmail.com>: > > > 2015-06-19 12:08 GMT+02:00 Ben RUBSON <ben.rub...@gmail.com>: > >> > >> 2015-06-18 22:52 GMT+02:00 shawn l.green <shawn.l.gr...@oracle.com>: > >>> > >>> On 6/18/2015 2:10 PM, Ben RUBSON wrote: > >>>> > >>>> Hello, > >>>> > >>>> In order for the slave to quickly show a communication issue between > >>>> the master and the slave, I set slave_net_timeout to 10. > >>>> "show slave status" then quickly updates, perfect. > >>>> > >>>> I would also like the master to quickly show when the slave is no more > >>>> reachable. > >>>> > >>>> However, "show processlist" and "show slave hosts" take a very long > >>>> time to update their status when the slave has gone. > >>>> Is there any way to have a refresh rate of about 10 seconds, as I did > >>>> on slave side ? > >>> > >>> There are two situations to consider > >>> > >>> 1) The slave is busy re-trying. It will do this a number of times then > >>> eventually disconnect itself. If it does disconnect itself, the > processlist > >>> report will show it as soon as that happens. > >> > >> Yes, I confirm. > >> > >>> 2) The connection between the master and slave died (or the slave > itself is > >>> lost). In this case, the server did not receive any "I am going to > >>> disconnect" message from its client (the slave). So as far as the > server is > >>> concerned, it is simply sitting in a wait expecting the client to > eventually > >>> send in a new command packet. > >>> > >>> That wait is controlled by --wait-timeout. Once an idle client > connection > >>> hits that limit, the server is programmed to think "the idiot on the > other > >>> end of this call has hung up on me" so it simply closes its end of the > >>> socket. There are actually two different timers that could be used, > >>> --wait-timeout or --interactive-timeout and which one is used to > monitor the > >>> idle socket depends entirely on if the client did or did not set the > >>> 'interactive flag' when it formed the connection. MySQL slaves do not > use > >>> that flag. > >>> > >>> Now, if the line between the two systems died in the middle of a > >>> conversation (an actual data transfer) then a shorter > -net-write-timeout or > >>> --net-read-timeout would expire and the session would die then. > >> > >> This is the interesting part yes, when the connection dies (whatever > >> the link status is at this moment, idle or not). > >> So I set wait_timeout=10. > >> > >> When the link is up, we clearly see that the idle connection is reset > >> every 10 seconds : the "show processlist" clearly shows that the slave > >> TCP source port changes, and time is reset from 10 to 0. > >> Perfect. > > > > Well this behavior is due to slave_net_timeout, not to wait_timeout. > > So neither wait_timeout nor interactive_timeout (expected), > > net_read_timeout, net_write_timeout helped. > > > >> However, when the link dies, the "Binlog Dump" process stays in the > >> "show processlist", I have to wait more than 1000 seconds for it to > >> disappear. > >> I made tests adding interactive_timeout=10, net_read_timeout=10 and > >> net_write_timeout=10, however the behavior is the same. > >> > >> Did I miss something ? > >> > >> Of course goal is to monitor replication, from the slave (done and > >> working thanks to slave_net_timeout), but from the master too (some > >> more tuning needed), as we never know which one will be able to > >> transmit the alert properly. > >> > >> Thank you very much Shawn. > > Hello, > > Would you have any further advice on this topic please ? > > Thank you again, > > Best regards, > > Ben > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: http://lists.mysql.com/mysql > >