I have 4 OSDs that won't stay in the cluster. I restart them, they join
for a bit, then get kicked out because they stop responding to pings
from the other OSDs.
I don't know what the issue is. The disks look fine. SMART reports no
errors or reallocated sectors. iostat says the disks are nearly idle
when the OSD stops responding. dmesg says it's restarting the process,
but doesn't say anything else interesting. kern.log doesn't say anything.
I'm out of ideas, and I'm ready to gamble.
So I have two ideas that might fix the issue. I can upgrade Emperor to
Firefly. Or I can upgrade Ubuntu 12.04 (kernel 3.5.0-49-generic) to
14.04 (kernel 3.13.0-24-generic). If I upgrade to 14.04, I plan to hold
Ceph on Emperor for the time being.
My PG states:
1989 active+clean
17 active+remapped
12 down+peering
507 active+degraded
1 active+degraded+remapped+wait_backfill
28 stale+down+peering
2 active+recovering+degraded+remapped
1 down+remapped+peering
3 incomplete
If I upgrade to Firefly, am I going to make things worse?
Any opinions on which gamble is more likely to pay off?
I plan to do both upgrades, but I want to do them one at a time unless
necessary. I'm wondering which upgrade I should attempt first.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com