Frank,

Frank Bottone wrote:
I've been having trouble with my master/slave server - recently I was having a few repeated issues where the mysql slave would stop due to "invalid sql syntax", but the queries executed fine on the master. I would have to manually dig through the logs and then find the query to manually execute on the slave, then use skip_counter to resume the replication skipping the corrupted statement on the slave. I thought it might be hardware related since it was only affecting the slave, so I moved it to a different blade (both the servers are blades).

However, today I was greeted with a nagios alert that the slave had stopped again. This time, it seems like the relay log is definitely corrupt. I was able to run mysqlbinlog > /dev/null on all the master logs, none are corrupt (including the one it had read up to on the slave). The relay log on the slave is though - it reports
"[EMAIL PROTECTED] mysql]# mysqlbinlog mysql02-relay-bin.010923 > /dev/null
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 38210134, event_type: 0
Could not read entry at offset 618730:Error in log format or read error"

_Nothing too much different in the logs either:

_071006 11:18:52 [Note] Slave I/O thread: connected to master '[EMAIL PROTECTED] 4:3306', replication started in log 'mysql-bin.000104' at position 906124600 071008 9:07:12 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
071008  9:07:13 [Note] Slave I/O thread: Failed reading log event,

... snip ...

their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
071008 12:15:33 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped
at log 'mysql-bin.000105' position 893425700


Any help or ideas tracking this down would be appreciated - I think we are going to have to take down the production database to resync the two and get replication going again. We mainly use the replica for backup purposes in order to avoid downtime during the backup and in the event of a hardware issue with the master.

No need to take down the master or re-initialize the slave, given what I've seen so far. Just tell the slave to throw away its relay logs and re-fetch from the master. From the output you showed,

CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000105', MASTER_LOG_POS=893425700;

This will discard the relay logs and re-fetch them. As long as that master log hasn't been purged on the master, you might be OK.

You might want to take a look at mysql-table-checksum. Your data could be fine, but it might also be different on the slave. But there's no need to worry about it until you prove it:

http://mysqltoolkit.sourceforge.net/

Your corruption in the relay logs could be caused by any number of things -- bad network, bad hardware, software bug... You could add your voice to an outstanding bug request:

http://bugs.mysql.com/bug.php?id=25737

Hope that helps
Baron

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to