Hi Deepak, As Wildo pointed it out in the thread you linked, "osd crush update on start" and osd crush location are quick ways to fix this. If you are doing custom locations (like for tiering NVMe vs HDD) "osd crush location hook" (Doc: http://docs.ceph.com/docs/master/rados/operations/crush-map/#custom-location-hooks ) is a good option as well: it allows you to configure the crush location of the OSD based on a script, it shouldn't be too hard to detect if the OSD is NVMe or SATA and set its location based on that. It's really nice when you add new OSDs to see them arrive in the right location automatically. Shameless plug: you can find an example in this blog post http://www.root314.com/2017/01/15/Ceph-storage-tiers/#tiered-crushmap I hope it helps
Cheers, Maxime On Sat, 1 Jul 2017 at 03:28 Deepak Naidu <[email protected]> wrote: > OK, so looks like its ceph crushmap behavior > http://docs.ceph.com/docs/master/rados/operations/crush-map/ > > > > -- > > Deepak > > > > *From:* ceph-users [mailto:[email protected]] *On Behalf > Of *Deepak Naidu > *Sent:* Friday, June 30, 2017 7:06 PM > *To:* David Turner; [email protected] > > > *Subject:* Re: [ceph-users] 300 active+undersized+degraded+remapped > > > > OK, I fixed the issue. But this is very weird. But will list them so its > easy for other to check when there is similar issue. > > > > 1) I had create rack aware osd tree > > 2) I have SATA OSD’s and NVME OSD > > 3) I created rack aware policy for both SATA and NVME OSD > > 4) NVME OSD was used for CEPH FS Meta > > 5) Recently: When I tried reboot of OSD node, it seemed that my > journal volumes which were on NVME didn’t startup bcos of the UDEV rules > and I had to create startup script to fix them. > > 6) With that. I had rebooted all the OSD one by one monitoring the > ceph status. > > 7) I was at the 3rd last node, then I notice the pgstuck warning. > Not sure when and what happened, but I started getting this PG stuck > issue(which is listed in my original email) > > 8) I wasted time to look at the issue/error, but then I found the > pool 100% used issue. > > 9) Now when I tried ceph osd tree. It looks like my NVME OSD’s went > back to the host level OSD’s rather than the newly created/mapped NVME rack > level. Ie no OSD’s under nvme-host name. This was the issue. > > 10) Luckily I had created the backup of compiled version. I imported > them in crushmap rule and now pool status is OK. > > > > But, my question is how did ceph re-map the CRUSH rule ? > > > > I had to create “new host entry” for NVME in crushmap ie > > > > host OSD1-nvme -- This is just dummy entry in crushmap ie it > doesn’t resolve to any hostname > > host OSD1 -- This is the actual hostname and > resolves to IP and has an hostname > > > > Is that the issue ? > > > > Current status > > > > health HEALTH_OK > > osdmap e5108: 610 osds: 610 up, 610 in > > flags sortbitwise,require_jewel_osds > > pgmap v247114: 15450 pgs, 3 pools, 322 GB data, 86102 objects > > 1155 GB used, 5462 TB / 5463 TB avail > > 15450 active+clean > > > > > > Pool1 15 233M 0 1820T > 3737 > > Pool2 16 0 0 > 1820T 0 > > Pool Meta 17 34928k 0 > 2357G 28 > > > > > > *Partial list of my osd tree* > > > > -15 2.76392 rack > rack1-nvme > > -18 0.69098 host OSD1-nvme > > 60 0.69098 osd.60 up 1.00000 > 1.00000 > > -21 0.69098 host OSD2-nvme > > 243 0.69098 osd.243 up 1.00000 > 1.00000 > > -24 0.69098 host > OSD3-NGN1-nvme > > 426 0.69098 osd.426 up 1.00000 > 1.00000 > > -1 5456.27734 root > default > > -12 2182.51099 rack > rack1-sata > > -2 545.62775 host OSD1 > > 0 9.09380 osd.0 up 1.00000 > 1.00000 > > 1 9.09380 osd.1 up 1.00000 > 1.00000 > > 2 9.09380 osd.2 up 1.00000 > 1.00000 > > 3 9.09380 osd.3 up 1.00000 > 1.00000 > > -2 545.62775 host OSD2 > > 0 9.09380 osd.0 up 1.00000 > 1.00000 > > 1 9.09380 osd.1 up 1.00000 > 1.00000 > > 2 9.09380 osd.2 up 1.00000 > 1.00000 > > 3 9.09380 osd.3 up 1.00000 > 1.00000 > > -2 545.62775 host OSD2 > > 0 9.09380 osd.0 up 1.00000 > 1.00000 > > 1 9.09380 osd.1 up 1.00000 > 1.00000 > > 2 9.09380 osd.2 up 1.00000 > 1.00000 > > 3 9.09380 osd.3 up 1.00000 > 1.00000 > > > > > > -- > > Deepak > > > > > > > > *From:* David Turner [mailto:[email protected] <[email protected]>] > > *Sent:* Friday, June 30, 2017 6:36 PM > *To:* Deepak Naidu; [email protected] > *Subject:* Re: [ceph-users] 300 active+undersized+degraded+remapped > > > > ceph status > ceph osd tree > > Is your meta pool on ssds instead of the same root and osds as the rest of > the cluster? > > > > On Fri, Jun 30, 2017, 9:29 PM Deepak Naidu <[email protected]> wrote: > > Hello, > > > > I am getting the below error and I am unable to get them resolved even > after starting and stopping the OSD’s. All the OSD’s seems to be up. > > > > How do I repair the OSD’s or fix them manually. I am using cephFS. But > oddly the ceph df is showing 100% used(which is showing in KB). But the > pool is 1886G(with 3 copies). I can still write to the ceph FS without any > issue. Not sure why is CEPH reporting the wrong info of 100% full > > > > > > ceph version 10.2.7 > > > > health HEALTH_WARN > > 300 pgs degraded > > 300 pgs stuck degraded > > 300 pgs stuck unclean > > 300 pgs stuck undersized > > 300 pgs undersized > > recovery 28/19674 objects degraded (0.142%) > > recovery 56/19674 objects misplaced (0.285%) > > > > > > > > GLOBAL: > > SIZE AVAIL RAW USED %RAW USED > > 5463T 5462T 187G 0 > > POOLS: > > NAME ID USED %USED MAX AVAIL > OBJECTS > > Pool1 15 233M 0 > 1820T 3737 > > Pool2 16 0 0 > 1820T 0 > > PoolMeta 17 34719k 100.00 0 > 28 > > > > > > Any help is appreciated > > > > -- > > Deepak > ------------------------------ > > This email message is for the sole use of the intended recipient(s) and > may contain confidential information. Any unauthorized review, use, > disclosure or distribution is prohibited. If you are not the intended > recipient, please contact the sender by reply email and destroy all copies > of the original message. > ------------------------------ > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
