On 5/7/14 15:33 , Dimitri Maziuk wrote:
On 05/07/2014 04:11 PM, Craig Lewis wrote:
On 5/7/14 13:40 , Sergey Malinin wrote:
Check dmesg and SMART data on both nodes. This behaviour is similar to
failing hdd.


It does sound like a failing disk... but there's nothing in dmesg, and
smartmontools hasn't emailed me about a failing disk.  The same thing is
happening to more than 50% of my OSDs, in both nodes.
check 'iostat -dmx 5 5' (or some other numbers) -- if you see 100%+ disk
utilization, that could be the dying one.



About an hour after I applied the osd_recovery_max_active=1, things settled down. Looking at the graphs, it looks like most of the OSDs crashed one more time, then started working correctly.

Because of the very low recovery parameters, there's on a single backfill running. `iostat -dmx 5 5` did report 100% util on the osd that is backfilling, but I expected that. Once backfilling moves on to a new osd, the 100% util follows the backfill operation.


There's a lot of recovery to finish. Hopefully this will last until it completes. If so, I'm adding osd_recovery_max_active=1 to ceph.conf.

--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to