Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without Intervention

Luis Motta Campos Sun, 23 Oct 2011 23:10:09 -0700

Claudio, 

Thank you for your interest. 
I will wait for the issue to happen again and will see what kind of information 
I can get back with strace. This is indeed something I didn't think of trying 
yet.


I'll keep you people posted on this. 
Much appreciated on the new approaches and fresh ideas. 
Kind regards,
--
Luis Motta Campos

On 23 Oct 2011, at 23:27, Claudio Nanni <claudio.na...@gmail.com> wrote:

> Luis,
> 
> Very hard to tackle.
> In my experience, excluding external(to mysql) bottlenecks, like hardware,
> o.s. etc, 'suspects' are the shared resources 'guarded' by unique mutexes,
> like on the query cache or key cache.
> Since you do not use MySQL it cannot be the key cache. Since you use percona
> the query cache is disabled by default.
> You should go a bit lower level and catch the system calls with one of the
> tools you surely know to see if there are waits on the semaphores.
> 
> I also would like to tell that the 'seconds behind master' reported by the
> slave is not reliable.
> 
> Good luck!
> 
> Claudio
> 
> 2011/10/23 Tyler Poland <tpol...@engineyard.com>
> 
>> Luis,
>> 
>> How large is your database?  Have you checked for an increase in write
>> activity on the master leading up to this? Are you running a backup against
>> the replica?
>> 
>> Thank you,
>> Tyler
>> 
>> Sent from my Droid Bionic
>> On Oct 23, 2011 5:40 AM, "Luis Motta Campos" <luismottacam...@yahoo.co.uk>
>> wrote:
>> 
>>> Fellow DBAs and MySQL Users
>>> 
>>> [apologies for eventual duplicates - I've posted this to
>>> percona-discuss...@googlegroups.com also]
>>> 
>>> I've been hunting an issue with my database cluster for several months
>> now
>>> without much success. Maybe I'm overlooking something here.
>>> 
>>> I've been observing the database slowing down and lagging behind for
>>> thousands of seconds (sometimes over the course of several days) even
>>> without any query load besides replication itself.
>>> 
>>> I am running Percona MySQL 5.1.51 (InnoDB plug-in version 1.12) on Dell
>>> R710 (6 x 3.5 inch 15K RPM disks in RAID10; 24GB RAM; 2x Quad-core Intel
>>> processors) running Debian Lenny. MySQL data, binary logs, relay logs,
>>> innodb log files are on separated partitions from each other, on a RAID
>>> system separated from the operating system disks.
>>> 
>>> Default Storage Engine is InnoDB, and the usual InnoDB memory structures
>>> are stable and look healthy.
>>> 
>>> I have about 500 (read) queries per second on average, and about 10% of
>>> this as writes on the master.
>>> 
>>> I've been observing something that looks like between 6 and 10 pending
>>> reads per second uniformly on my cacti graphs.
>>> 
>>> The issue is characterized by the server suddenly slowing down writes
>>> without any previous warning or change, and lagging behind for several
>>> thousand seconds (triggering all sorts of alerts on my monitoring
>> system). I
>>> don't observe extra CPU activity, just a reduced disk access ratio (from
>>> about 5-6MB/s to 500KB/s) and replication lagging. I could correlate it
>>> neither InnoDB hashing activity, nor with long-running-queries, nor with
>>> background read/write thread activities.
>>> 
>>> I don't have any clues of what is causing this behavior, and I'm unable
>> to
>>> reproduce it under controlled conditions. I've observed the issue both on
>>> severs with and without workload (apart from the usual replication load).
>> I
>>> am sure no changes were applied to the server or to the cluster.
>>> 
>>> I'm looking forward for suggestions and theories on the issue - all ideas
>>> are welcome.
>>> Thank you for your time and attention,
>>> Kind regards,
>>> --
>>> Luis Motta Campos
>>> is a DBA, Foodie, and Photographer
>>> 
>>> 
>>> --
>>> MySQL General Mailing List
>>> For list archives: http://lists.mysql.com/mysql
>>> To unsubscribe:
>>> http://lists.mysql.com/mysql?unsub=tpol...@engineyard.com
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Claudio

Re: 5.1.51 Database Replica Slows Down Suddenly, Lags For Days, and Recovers Without Intervention

Reply via email to