On 5/7/14 15:33 , Dimitri Maziuk wrote:
On 05/07/2014 04:11 PM, Craig Lewis wrote:
On 5/7/14 13:40 , Sergey Malinin wrote:
Check dmesg and SMART data on both nodes. This behaviour is similar to
failing hdd.
It does sound like a failing disk... but there's nothing in dmesg, and
smartmontools hasn't emailed me about a failing disk. The same thing is
happening to more than 50% of my OSDs, in both nodes.
check 'iostat -dmx 5 5' (or some other numbers) -- if you see 100%+ disk
utilization, that could be the dying one.
About an hour after I applied the osd_recovery_max_active=1, things
settled down. Looking at the graphs, it looks like most of the OSDs
crashed one more time, then started working correctly.
Because of the very low recovery parameters, there's on a single
backfill running. `iostat -dmx 5 5` did report 100% util on the osd
that is backfilling, but I expected that. Once backfilling moves on to
a new osd, the 100% util follows the backfill operation.
There's a lot of recovery to finish. Hopefully this will last until it
completes. If so, I'm adding osd_recovery_max_active=1 to ceph.conf.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com