Hi Craig, Recreating the missing PG’s fixed it. Thanks for your help.
But when I tried to mount the Filesystem, it gave me the “mount error 5”. I tried to restart the MDS server but it won’t work. It tells me that it’s laggy/unresponsive. BTW, all these machines are VM’s. [jshah@Lab-cephmon001 ~]$ ceph health detail HEALTH_WARN mds cluster is degraded; mds Lab-cephmon001 is laggy mds cluster is degraded mds.Lab-cephmon001 at 17.147.16.111:6800/3745284 rank 0 is replaying journal mds.Lab-cephmon001 at 17.147.16.111:6800/3745284 is laggy/unresponsive —Jiten On Nov 20, 2014, at 4:20 PM, JIten Shah <[email protected]> wrote: > Ok. Thanks. > > —Jiten > > On Nov 20, 2014, at 2:14 PM, Craig Lewis <[email protected]> wrote: > >> If there's no data to lose, tell Ceph to re-create all the missing PGs. >> >> ceph pg force_create_pg 2.33 >> >> Repeat for each of the missing PGs. If that doesn't do anything, you might >> need to tell Ceph that you lost the OSDs. For each OSD you moved, run ceph >> osd lost <OSDID>, then try the force_create_pg command again. >> >> If that doesn't work, you can keep fighting with it, but it'll be faster to >> rebuild the cluster. >> >> >> >> On Thu, Nov 20, 2014 at 1:45 PM, JIten Shah <[email protected]> wrote: >> Thanks for your help. >> >> I was using puppet to install the OSD’s where it chooses a path over a >> device name. Hence it created the OSD in the path within the root volume >> since the path specified was incorrect. >> >> And all 3 of the OSD’s were rebuilt at the same time because it was unused >> and we had not put any data in there. >> >> Any way to recover from this or should i rebuild the cluster altogether. >> >> —Jiten >> >> On Nov 20, 2014, at 1:40 PM, Craig Lewis <[email protected]> wrote: >> >>> So you have your crushmap set to choose osd instead of choose host? >>> >>> Did you wait for the cluster to recover between each OSD rebuild? If you >>> rebuilt all 3 OSDs at the same time (or without waiting for a complete >>> recovery between them), that would cause this problem. >>> >>> >>> >>> On Thu, Nov 20, 2014 at 11:40 AM, JIten Shah <[email protected]> wrote: >>> Yes, it was a healthy cluster and I had to rebuild because the OSD’s got >>> accidentally created on the root disk. Out of 4 OSD’s I had to rebuild 3 of >>> them. >>> >>> >>> [jshah@Lab-cephmon001 ~]$ ceph osd tree >>> # id weight type name up/down reweight >>> -1 0.5 root default >>> -2 0.09999 host Lab-cephosd005 >>> 4 0.09999 osd.4 up 1 >>> -3 0.09999 host Lab-cephosd001 >>> 0 0.09999 osd.0 up 1 >>> -4 0.09999 host Lab-cephosd002 >>> 1 0.09999 osd.1 up 1 >>> -5 0.09999 host Lab-cephosd003 >>> 2 0.09999 osd.2 up 1 >>> -6 0.09999 host Lab-cephosd004 >>> 3 0.09999 osd.3 up 1 >>> >>> >>> [jshah@Lab-cephmon001 ~]$ ceph pg 2.33 query >>> Error ENOENT: i don't have paid 2.33 >>> >>> —Jiten >>> >>> >>> On Nov 20, 2014, at 11:18 AM, Craig Lewis <[email protected]> wrote: >>> >>>> Just to be clear, this is from a cluster that was healthy, had a disk >>>> replaced, and hasn't returned to healthy? It's not a new cluster that has >>>> never been healthy, right? >>>> >>>> Assuming it's an existing cluster, how many OSDs did you replace? It >>>> almost looks like you replaced multiple OSDs at the same time, and lost >>>> data because of it. >>>> >>>> Can you give us the output of `ceph osd tree`, and `ceph pg 2.33 query`? >>>> >>>> >>>> On Wed, Nov 19, 2014 at 2:14 PM, JIten Shah <[email protected]> wrote: >>>> After rebuilding a few OSD’s, I see that the pg’s are stuck in degraded >>>> mode. Sone are in the unclean and others are in the stale state. Somehow >>>> the MDS is also degraded. How do I recover the OSD’s and the MDS back to >>>> healthy ? Read through the documentation and on the web but no luck so far. >>>> >>>> pg 2.33 is stuck unclean since forever, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 0.30 is stuck unclean since forever, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 1.31 is stuck unclean since forever, current state >>>> stale+active+degraded, last acting [2] >>>> pg 2.32 is stuck unclean for 597129.903922, current state >>>> stale+active+degraded, last acting [2] >>>> pg 0.2f is stuck unclean for 597129.903951, current state >>>> stale+active+degraded, last acting [2] >>>> pg 1.2e is stuck unclean since forever, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 2.2d is stuck unclean since forever, current state >>>> stale+active+degraded+remapped, last acting [2] >>>> pg 0.2e is stuck unclean since forever, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 1.2f is stuck unclean for 597129.904015, current state >>>> stale+active+degraded, last acting [2] >>>> pg 2.2c is stuck unclean since forever, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 0.2d is stuck stale for 422844.566858, current state >>>> stale+active+degraded, last acting [2] >>>> pg 1.2c is stuck stale for 422598.539483, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 2.2f is stuck stale for 422598.539488, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 0.2c is stuck stale for 422598.539487, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 1.2d is stuck stale for 422598.539492, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 2.2e is stuck stale for 422598.539496, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 0.2b is stuck stale for 422598.539491, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 1.2a is stuck stale for 422598.539496, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> pg 2.29 is stuck stale for 422598.539504, current state >>>> stale+active+degraded+remapped, last acting [3] >>>> . >>>> . >>>> . >>>> 6 ops are blocked > 2097.15 sec >>>> 3 ops are blocked > 2097.15 sec on osd.0 >>>> 2 ops are blocked > 2097.15 sec on osd.2 >>>> 1 ops are blocked > 2097.15 sec on osd.4 >>>> 3 osds have slow requests >>>> recovery 40/60 objects degraded (66.667%) >>>> mds cluster is degraded >>>> mds.Lab-cephmon001 at X.X.16.111:6800/3424727 rank 0 is replaying journal >>>> >>>> —Jiten >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> [email protected] >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> [email protected] >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
