Is there anything unusual in dmesg at osd.5?

On Wednesday, May 7, 2014 at 23:09, Craig Lewis wrote:

> I already have osd_max_backfill = 1, and osd_recovery_op_priority = 1.  
> 
> osd_recovery_max_active is the default 15, so I'll give that a try...  some 
> OSDs timed out during the injectargs.  I added it to ceph.conf, and restarted 
> them all.  
> 
> I was running RadosGW-Agent, but it's down now.  I disabled scrub and 
> deep-scrub as well.  All the Disk I/O is dedicated to recovery now.
> 
> 15 minutes after the restart:
> 2014-05-07 13:03:19.249179 mon.0 [INF] osd.5 marked down after no pg stats 
> for 901.601323seconds
> 
> One of the OSDs (osd.5) didn't complete the peering process.  It's like the 
> OSD locked up immediately after restart.  It looks like it too.  As soon as 
> osd.5 started peering, it went to exactly 100% CPU, and other OSDs start 
> complaining that it wasn't responding to subops.
> 
> 
> 

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to