Re: [ceph-users] 16 osds: 11 up, 16 in

Craig Lewis Wed, 07 May 2014 17:36:08 -0700

On 5/7/14 15:33 , Dimitri Maziuk wrote:

On 05/07/2014 04:11 PM, Craig Lewis wrote:

On 5/7/14 13:40 , Sergey Malinin wrote:

Check dmesg and SMART data on both nodes. This behaviour is similar to
failing hdd.

It does sound like a failing disk... but there's nothing in dmesg, and
smartmontools hasn't emailed me about a failing disk.  The same thing is
happening to more than 50% of my OSDs, in both nodes.

check 'iostat -dmx 5 5' (or some other numbers) -- if you see 100%+ disk
utilization, that could be the dying one.

About an hour after I applied the osd_recovery_max_active=1, thingssettled down. Looking at the graphs, it looks like most of the OSDscrashed one more time, then started working correctly.

Because of the very low recovery parameters, there's on a singlebackfill running. `iostat -dmx 5 5` did report 100% util on the osdthat is backfilling, but I expected that. Once backfilling moves on toa new osd, the 100% util follows the backfill operation.

There's a lot of recovery to finish. Hopefully this will last until itcompletes. If so, I'm adding osd_recovery_max_active=1 to ceph.conf.


--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>

*Central Desktop. Work together in ways you never thought possible.*

Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 16 osds: 11 up, 16 in

Reply via email to