Re: [ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"

Quenten Grasso Mon, 31 Mar 2014 16:18:07 -0700

Thanks Greg,

Looking forward to the new release!


Regards,
Quenten Grasso

-----Original Message-----
From: Gregory Farnum [mailto:[email protected]] 
Sent: Tuesday, 1 April 2014 3:08 AM
To: Quenten Grasso
Cc: Kyle Bader; [email protected]
Subject: Re: [ceph-users] OSD Restarts cause excessively high load average and 
"requests are blocked > 32 sec"

Yep, that looks like http://tracker.ceph.com/issues/7093, which is fixed in 
dumpling and most of the dev releases since emperor. ;) I also cherry-picked 
the fix to the emperor branch and it will be included whenever we do another 
point release of that.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Mar 25, 2014 at 6:39 PM, Quenten Grasso <[email protected]> wrote:
> Hi Greg,
>
> Restarting the actual service ie: service ceph restart osd.50, only takes a 
> few seconds.
>
> Attached is a ceph -w of just running a service ceph restart osd.50,
>
> You can see it marks itself down pretty much straight away. Takes a little 
> while to mark itself as up and finish "recovery"
>
> If I do this to all 12 osd's the node goes crazy, It's almost like the 
> node is cpu bound but it has 6 cores, and load average goes to 300+
>
> http://pastie.org/pastes/8968950/text?key=0e0bs1ojbm2arnexn52iwq
>
> Regards,
> Quenten
>
> -----Original Message-----
> From: Gregory Farnum [mailto:[email protected]]
> Sent: Wednesday, 26 March 2014 2:02 AM
> To: Quenten Grasso
> Cc: Kyle Bader; [email protected]
> Subject: Re: [ceph-users] OSD Restarts cause excessively high load average 
> and "requests are blocked > 32 sec"
>
> How long does it take for the OSDs to restart? Are you just issuing a restart 
> command via upstart/sysvinit/whatever? How many OSDMaps are generated from 
> the time you issue that command to the time the cluster is healthy again?
>
> This sounds like an issue we had for a while where OSDs would start peering 
> before they had processed the maps they needed to look at; the fix might not 
> have been backported to Emperor. But I'd like to be sure this isn't some 
> other issue you're seeing.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Sat, Mar 22, 2014 at 8:16 PM, Quenten Grasso <[email protected]> wrote:
>> Hi Kyle,
>>
>> Thanks, I turned on debug ms = 1 and debug osd = 10 and restarted osd.54 
>> heres here's log for that one.
>>
>> ceph-osd.54.log.bz2
>> http://www67.zippyshare.com/v/99704627/file.html
>>
>>
>> Strace osd 53,
>> strace.zip
>> http://www43.zippyshare.com/v/17581165/file.html
>>
>>
>> Thanks,
>> Quenten
>> -----Original Message-----
>> From: Kyle Bader [mailto:[email protected]]
>> Sent: Sunday, 23 March 2014 12:10 PM
>> To: Quenten Grasso
>> Subject: Re: [ceph-users] OSD Restarts cause excessively high load average 
>> and "requests are blocked > 32 sec"
>>
>>> Any ideas on why the load average goes so crazy & starts to block IO?
>>
>> Could you turn on "debug ms = 1" and "debug osd = 10" prior to restarting 
>> the OSDs on one of your hosts and sharing the logs so we can take a look?
>>
>> It also might be worth while to strace one of the OSDs to try to determine 
>> what it's working so hard on, maybe:
>>
>> strace -fc -p <osd pid>  > strace.osd1.log
>>
>> Thanks!
>>
>> --
>>
>> Kyle
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD Restarts cause excessively high load average and "requests are blocked > 32 sec"

Reply via email to