I would suggest either adding 1 new disk on each of the 2 machines
increasing the osd_backfill_full_ratio to something like 90 or 92 from
default 85.
/Maged
On 2017-08-28 08:01, hjcho616 wrote:
> Hello!
>
> I've been using ceph for long time mostly for network CephFS storage, even
> before Argonaut release! It's been working very well for me. Yes, I had
> some power outtages before and asked few questions on this list before and
> got resolved happily! Thank you all!
>
> Not sure why but we've been having quite a bit of power outages lately. Ceph
> appear to be running OK with those going on.. so I was pretty happy and
> didn't thought much of it... till yesterday, When I started to move some
> videos to cephfs, ceph decided that it was full although df showed only 54%
> utilization! Then I looked up, some of the osds were down! (only 3 at that
> point!)
>
> I am running pretty simple ceph configuration... I have one machine running
> MDS and mon named MDS1. Two OSD machines with 5 2TB HDDs and 1 SSD for
> journal named OSD1 and OSD2.
>
> At the time, I was running jewel 10.2.2. I looked at some of downed OSD's log
> file and googled some of them... they appeared to be tied to version 10.2.2.
> So I just upgraded all to 10.2.9. Well that didn't solve my problems.. =P
> While looking at some of this.. there was another power outage! D'oh! I may
> need to invest in a UPS or something... Until this happened, all of the osd
> down were from OSD2. But OSD1 took a hit! Couldn't boot, because osd-0 was
> damaged... I tried xfs_repair -L /dev/sdb1 as suggested by command line.. I
> was able to mount it again, phew, reboot... then /dev/sdb1 is no longer
> accessible! Noooo!!!
>
> So this is what I have today! I am a bit concerned as half of the osds are
> down! and osd.0 doesn't look good at all...
> # ceph osd tree
> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 16.24478 root default
> -2 8.12239 host OSD1
> 1 1.95250 osd.1 up 1.00000 1.00000
> 0 1.95250 osd.0 down 0 1.00000
> 7 0.31239 osd.7 up 1.00000 1.00000
> 6 1.95250 osd.6 up 1.00000 1.00000
> 2 1.95250 osd.2 up 1.00000 1.00000
> -3 8.12239 host OSD2
> 3 1.95250 osd.3 down 0 1.00000
> 4 1.95250 osd.4 down 0 1.00000
> 5 1.95250 osd.5 down 0 1.00000
> 8 1.95250 osd.8 down 0 1.00000
> 9 0.31239 osd.9 up 1.00000 1.00000
>
> This looked alot better before that last extra power outage... =( Can't
> mount it anymore!
> # ceph health
> HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 44 pgs
> backfill_toofull; 80 pgs backfill_wait; 122 pgs degraded; 6 pgs down; 8 pgs
> inconsistent; 6 pgs peering; 2 pgs recovering; 18 pgs recovery_wait; 16 pgs
> stale; 122 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs stuck stale; 159
> pgs stuck unclean; 102 pgs stuck undersized; 102 pgs undersized; 1 requests
> are blocked > 32 sec; recovery 1803466/4503980 objects degraded (40.042%);
> recovery 692976/4503980 objects misplaced (15.386%); recovery 147/2251990
> unfound (0.007%); 1 near full osd(s); 54 scrub errors; mds cluster is
> degraded; no legacy OSD present but 'sortbitwise' flag is not set
>
> Each of osds are showing different failure signature.
>
> I've uploaded osd log with debug osd = 20, debug filestore = 20, and debug ms
> = 20. You can find it in below links. Let me know if there is preferred way
> to share this!
> https://drive.google.com/open?id=0By7YztAJNGUWQXItNzVMR281Snc
> (ceph-osd.3.log)
> https://drive.google.com/open?id=0By7YztAJNGUWYmJBb3RvLVdSQWc
> (ceph-osd.4.log)
> https://drive.google.com/open?id=0By7YztAJNGUWaXhRMlFOajN6M1k
> (ceph-osd.5.log)
> https://drive.google.com/open?id=0By7YztAJNGUWdm9BWFM5a3ExOFE
> (ceph-osd.8.log)
>
> So how does this look? Can this be fixed? =) If so please let me know. I
> used to take backups but since it grew so big, I wasn't able to do so
> anymore... and would like to get most of these back if I can. Please let me
> know if you need more info!
>
> Thank you!
>
> Regards,
> Hong
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com