In the interest of removing variables, I removed all snapshots on all
pools, then restarted all ceph daemons at the same time. This brought
up osd.8 as well.
The cluster started recovering. Now osd.4 and osd.13 are doing this.
Any suggestions for how I can see what the hung OSDs are doing? The logs
don't look interesting. Is there a higher log level I can use?
I'm trying to use strace on osd.4:
strace -tt -f -ff -o ./ceph-osd.4.strace -x /usr/bin/ceph-osd
--cluster=ceph -i 4 -f
So far, strace is running, and the process isn't hung. After I ran
this, the cluster finally finished backfilling the last of the PGs (all
on osd.4).
Since the cluster is healthy again, I killed the strace, and started
daemon normally (start ceph-osd id=4). Things seem fine now. I'm going
to let it scrub and deepscrub overnight. I'll restart radosgw-agent
tomorrow.
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
On 3/27/14 10:44 , Craig Lewis wrote:
The osd.8 log shows it doing some deep scrubbing here. Perhaps that is
what caused your earlier issues with CPU usage?
When I first noticed the CPU usage, I checked iotop and iostat. Both
said there was no disk activity, on any OSD.
At 14:17:25, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get.
regions list hung, and I killed At 14:18:15, I stopped ceph-osd id=8.
At 14:18:45, I ran radosgw-admin --name=client.radosgw.ceph1c regions
list && radosgw-admin --name=client.radosgw.ceph1c regionmap get. It
returned successfully.
At 14:19:10, I stopped ceph-osd id=*/4/*.
Since you've got the noout flag set, when osd.8 goes down any objects
for which osd.8 is the primary will not be readable. Since ceph reads
from primaries, and the noout flag prevents another osd from being
selected, which would happen if osd.8 were marked out, these objects
(which apparently happen to include some needed for regions list or
regionmap get) are inaccessible.
Josh
Taking osd.8 down (regardless of the noout flag) was the only way to
things to respond. I have not set nodown, just noout.
When I got in this morning, I had 4 more flapping OSDs: osd.4, osd.12,
osd.13, and osd.6. All 4 daemons were all using 100% CPU, and no disk
I/O.
osd.1 and osd.14 are the only ones currently using disk I/O.
There are 3 PGs being deepscrubbed:
root@ceph1c:/var/log/radosgw-agent# ceph pg dump | grep deep
dumped all in format plain
pg_stat objects mip degr unf bytes log disklog
state state_stamp v reported *up* *acting* last_scrub
scrub_stamp last_deep_scrub deep_scrub_stamp
11.774 8682 0 0 0 7614655060 3001 3001
active+clean+scrubbing+deep 2014-03-27 10:20:30.598032
8381'5180514 8521:6520833 *[13,4]* *[13,4]* 7894'5176984
2014-03-20 04:41:48.762996 7894'5176984 2014-03-20 04:41:48.762996
11.698 8587 0 0 0 7723737171 3001 3001
active+clean+scrubbing+deep 2014-03-27 10:16:31.292487
8383'483312 8521:618864 *[14,1]* *[14,1]* 7894'479783
2014-03-20 03:53:18.024015 7894'479783 2014-03-20 03:53:18.024015
11.d8 8743 0 0 0 7570365909 3409 3409
active+clean+scrubbing+deep 2014-03-27 10:15:39.558121
8396'1753407 8521:2417672 *[12,6]* *[12,6]* 7894'1459230
2014-03-20 02:40:22.123236 7894'1459230 2014-03-20 02:40:22.123236
These PGs are on the 6 OSDs mentioned. osd.1 and osd.14 are not using
100% CPU and are using disk IO. osd.12, osd.6, osd.4, and osd.13 are
using 100% CPU, and 0 kB/s of disk IO. Here's iostat on ceph0c, which
contains osd.1 (/dev/sdd), osd.4 (/dev/sde), and osd.6 (/dev/sdg):
root@ceph0c:/var/log/ceph# iostat -p sdd,sde,sdh 1
Linux 3.5.0-46-generic (ceph0c) 03/27/2014 _x86_64_ (8 CPU)
<snip>
avg-cpu: %user %nice %system %iowait %steal %idle
32.64 0.00 5.52 4.42 0.00 57.42
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdd 113.00 900.00 0.00 900 0
sdd1 113.00 900.00 0.00 900 0
sde 0.00 0.00 0.00 0 0
sde1 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdh1 0.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
29.90 0.00 4.41 2.82 0.00 62.87
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdd 181.00 1332.00 0.00 1332 0
sdd1 181.00 1332.00 0.00 1332 0
sde 22.00 8.00 328.00 8 328
sde1 18.00 8.00 328.00 8 328
sdh 18.00 4.00 228.00 4 228
sdh1 15.00 4.00 228.00 4 228
avg-cpu: %user %nice %system %iowait %steal %idle
30.21 0.00 4.26 1.71 0.00 63.82
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sdd 180.00 1044.00 200.00 1044 200
sdd1 177.00 1044.00 200.00 1044 200
sde 0.00 0.00 0.00 0 0
sde1 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdh1 0.00 0.00 0.00 0 0
So it's not no disk activity, but it's pretty close. The disks
continue to have 0 kB_read and 0kB_wrtn for the next 60 seconds. It's
much lower than I would expect for OSDs executing a deepscrub.
I restarted the 4 flapping OSDs. They recovered, then started
flapping within 5 minutes. I shut all of the ceph daemons down, and
rebooted all nodes at the same time. The OSDs return to 100% CPU
usage very soon after boot.
I was going to ask if I should zap osd.8 and re-add it to the
cluster. I don't think that's possible now.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com