On Fri, Apr 26, 2002 at 01:28:59PM -0600, Sasha Pachev wrote: [snip]
Sasha, here's an update with some context preserved for your sanity. :-) > > I recently wiped my 4.0.2 slave clean and installed the latest > > 4.0.2, built from the BK tree. Then I synced it up with a nearby > > slave running 3.23.47 (using rsync after I had flushed the tables > > on the other slave and run a "SLAVE STOP"). > > > > I started it up and it ran for about a day before it ran into a > > duplicate key error. The 3.23.47 slave hasn't hit the duplicate > > key error, nor have any of our other slaves. So it is a 4.0.2 bug > > of some sort. [snip] > * use mysqlbinlog to check the current relay log in the SQL thread > ( you can see it in SHOW SLAVE STATUS) at the current position and > one event prior to see if this is the same query. You will probably > want to do mysqlbinlog log-name > log.sql, search for the current > position in a text editor, and then scroll back one entry I spent all of Friday on that--literally. And you have the patch I made for mysqlbinlog to make the job easier. After hours of digging thru logs on the master, the 4.0.2 slave, and a 3.23.47 slave, I've concluded that they're all getting and attempted to execute exactly the same queries. I found no evidence of duplication or missing records in the relay log or binary log. The problem is something a bit more mysterious in 4.0.2. I don't yet know if it is replication specific or not. But the good news is that it *seems* to be repeatable. After hacking on this for quite some time on Friday, I just let the slave sit. Then on Monday, I did: * slave stop * set sql_slave_skip_counter = 1 * slave start and let it run like crazy. After a couple hours went buy, I found that the slave had stopped again. It had the same error on the same table on the same basic query! So I'm working to reproduce it under more controlled circumstances. I'd rather not submit a bug report that requires 2GB of relay log files to test. :-) As a side note, do you recall my complaints about how it can take a long time to get the "mysql> " prompt back (on FreeBSD) after executing the "slave stop; set ... ; slave start;" sequence? This time I watched in another window to see what MySQL was doing while I waited for my prompt to come back. To my surprise, the binary log on the slave was updating, so it was clearly replaying queries from the relay log. But it still took another 20 seconds or so to get that prompt back. Strange. It feels like there's a lock it's trying to obtain unnecessarily or something. Anyway, that's where things stand. There's *some bug* in 4.0.2, but I'm no longer convinced that it's a replication bug. Jeremy -- Jeremy D. Zawodny, <[EMAIL PROTECTED]> Technical Yahoo - Yahoo Finance Desk: (408) 349-7878 Fax: (408) 349-5454 Cell: (408) 685-5936 MySQL 3.23.47-max: up 81 days, processed 2,110,710,335 queries (299/sec. avg) --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php