[ceph-users] OSD flapping during recovery

Craig Lewis Mon, 17 Feb 2014 11:25:00 -0800

I had some issues with OSD flapping after 2 days of recovery. Itappears to be related to swapping, even though I have plenty of RAM forthe number of OSDs I have. The cluster was completely unusable, and Iended up rebooting all the nodes. It's been great ever since, but I'massuming it will happen again.


Details are below, but I'm wondering if anybody has any idea what happened?

I noticed some lumpy data distribution on my OSDs. Following the adviceon the mailling list, I increased the pg_num and pgp_num to the valuesfrom the formula. .rgw.buckets is the only large pool, so I increasedpg_num and pgp_num from 128 to 2048 on that one pool. Cluster statuschanges to HEALTH_WARN, there were 1920 PGs with stateactive+remapped+wait_backfill, and 32% of the objects were degraded.

Recovery was slow, and we were having some performance issues. Ilowered osd_max_backfills from 10 to 2, and osd_recovery_op_priorityfrom 10 to 2. This didn't slow the recovery down much, but made myapplication much more responsive. My journals are on the OSD disks (noSSDs). I believe the osd_max_backfills was the more important change,but it's much slower to test than the osd_recovery_op_priority change.Aside from those two, my notes say I changed and revertedosd_disk_threads, osd_op_threads, osd_recovery_threads. All changeswere pushed out using ceph --admin-daemon /var/run/ceph/ceph-osd.0.asokconfig set osd_max_backfills 2

I watched the cluster on and off over the weekend. Ceph was steadilyrecovering. It was down to ~900 PGs in active+remapped+wait_backfill,with 17% of objects degraded. A few OSDs have been marked down andrecovered, so a few tens of PGs are in stateactive+degraded+remapped+wait_backfill andactive+degraded+remapped+backfilling. I was poking around, and Inoticed kswapd was using betwen 5% and 30% CPU on all nodes. It wasbursty, peaking at 30% CPU usage for about 5sec out of every 30sec. Swapusage wasn't increasing, and kswapd appeared to be doing a lot ofnothing. My machines have 8 OSDs, and 36GB of RAM. top said that allmachines were caching 30GB of data. The 8 ceph-osd daemons are using0.5GB to 1.2GB of RAM. I don't have the exact numbers, but I believethey were using about 5GB for all 8 ceph-osd daemons.

A few hours later, and the OSDs really started flapping. They're beingvoted unresponsive and marked down faster than they can rejoin. At onepoint, a third of the OSDs were marked down. ceph -w is complainingabout hundreds of slow requests greater than 900 seconds. Most RGWaccesses are failing with HTTP timeouts. kswapd is using a consistent33% CPU on all nodes, with no variance that I can see. To add insult,the cluster was running a scrub and a deep scrub.

I eventually rebooted all nodes in the cluster, one at a time. Oncequorum reestablished, recovery proceeded at the original speed. TheOSDs are responding, and all my RGW requests are returning in areasonable amount of time. There are no complaints of slow requests inceph -w. kswapd is using 0% of the CPU.

I'm running Ceph 0.72.2 on Ubuntu 12.04.4, with kernel 3.5.0-37-generic#58~precise1-Ubuntu SMP.

I monitor the running version as well as the installed version, so Iknow that all daemons were restarted after the 0.72.1 -> 0.72.2upgrade. That happened on Jan 22nd.

Any idea what happened? I'm assuming it will happen again if recoverytakes long enough.





--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>

*Central Desktop. Work together in ways you never thought possible.*

Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSD flapping during recovery

Reply via email to