Is there anything unusual in dmesg at osd.5?
On Wednesday, May 7, 2014 at 23:09, Craig Lewis wrote: > I already have osd_max_backfill = 1, and osd_recovery_op_priority = 1. > > osd_recovery_max_active is the default 15, so I'll give that a try... some > OSDs timed out during the injectargs. I added it to ceph.conf, and restarted > them all. > > I was running RadosGW-Agent, but it's down now. I disabled scrub and > deep-scrub as well. All the Disk I/O is dedicated to recovery now. > > 15 minutes after the restart: > 2014-05-07 13:03:19.249179 mon.0 [INF] osd.5 marked down after no pg stats > for 901.601323seconds > > One of the OSDs (osd.5) didn't complete the peering process. It's like the > OSD locked up immediately after restart. It looks like it too. As soon as > osd.5 started peering, it went to exactly 100% CPU, and other OSDs start > complaining that it wasn't responding to subops. > > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
