Claudio, Thank you for your interest. I will wait for the issue to happen again and will see what kind of information I can get back with strace. This is indeed something I didn't think of trying yet.
I'll keep you people posted on this. Much appreciated on the new approaches and fresh ideas. Kind regards, -- Luis Motta Campos On 23 Oct 2011, at 23:27, Claudio Nanni <claudio.na...@gmail.com> wrote: > Luis, > > Very hard to tackle. > In my experience, excluding external(to mysql) bottlenecks, like hardware, > o.s. etc, 'suspects' are the shared resources 'guarded' by unique mutexes, > like on the query cache or key cache. > Since you do not use MySQL it cannot be the key cache. Since you use percona > the query cache is disabled by default. > You should go a bit lower level and catch the system calls with one of the > tools you surely know to see if there are waits on the semaphores. > > I also would like to tell that the 'seconds behind master' reported by the > slave is not reliable. > > Good luck! > > Claudio > > 2011/10/23 Tyler Poland <tpol...@engineyard.com> > >> Luis, >> >> How large is your database? Have you checked for an increase in write >> activity on the master leading up to this? Are you running a backup against >> the replica? >> >> Thank you, >> Tyler >> >> Sent from my Droid Bionic >> On Oct 23, 2011 5:40 AM, "Luis Motta Campos" <luismottacam...@yahoo.co.uk> >> wrote: >> >>> Fellow DBAs and MySQL Users >>> >>> [apologies for eventual duplicates - I've posted this to >>> percona-discuss...@googlegroups.com also] >>> >>> I've been hunting an issue with my database cluster for several months >> now >>> without much success. Maybe I'm overlooking something here. >>> >>> I've been observing the database slowing down and lagging behind for >>> thousands of seconds (sometimes over the course of several days) even >>> without any query load besides replication itself. >>> >>> I am running Percona MySQL 5.1.51 (InnoDB plug-in version 1.12) on Dell >>> R710 (6 x 3.5 inch 15K RPM disks in RAID10; 24GB RAM; 2x Quad-core Intel >>> processors) running Debian Lenny. MySQL data, binary logs, relay logs, >>> innodb log files are on separated partitions from each other, on a RAID >>> system separated from the operating system disks. >>> >>> Default Storage Engine is InnoDB, and the usual InnoDB memory structures >>> are stable and look healthy. >>> >>> I have about 500 (read) queries per second on average, and about 10% of >>> this as writes on the master. >>> >>> I've been observing something that looks like between 6 and 10 pending >>> reads per second uniformly on my cacti graphs. >>> >>> The issue is characterized by the server suddenly slowing down writes >>> without any previous warning or change, and lagging behind for several >>> thousand seconds (triggering all sorts of alerts on my monitoring >> system). I >>> don't observe extra CPU activity, just a reduced disk access ratio (from >>> about 5-6MB/s to 500KB/s) and replication lagging. I could correlate it >>> neither InnoDB hashing activity, nor with long-running-queries, nor with >>> background read/write thread activities. >>> >>> I don't have any clues of what is causing this behavior, and I'm unable >> to >>> reproduce it under controlled conditions. I've observed the issue both on >>> severs with and without workload (apart from the usual replication load). >> I >>> am sure no changes were applied to the server or to the cluster. >>> >>> I'm looking forward for suggestions and theories on the issue - all ideas >>> are welcome. >>> Thank you for your time and attention, >>> Kind regards, >>> -- >>> Luis Motta Campos >>> is a DBA, Foodie, and Photographer >>> >>> >>> -- >>> MySQL General Mailing List >>> For list archives: http://lists.mysql.com/mysql >>> To unsubscribe: >>> http://lists.mysql.com/mysql?unsub=tpol...@engineyard.com >>> >>> >> > > > > -- > Claudio