Re: [ceph-users] Accidentally Remove OSDs

FaHui Lin Thu, 23 Apr 2015 22:11:55 -0700

Dear Robert,

Yes, you're right. The two OSDs removed of the PGs are from the samehost and contradict to my rules (that's a reason I removed them).Unfortunately the partitions of the disk are all formatted so I cannotrecover the data.

However, the command "ceph pg force_create_pg <pg ID>" and restartingthe OSD daemons works to clean stale pgs. Now my ceph health is OK andthe rbd service can work normally.


Many thanks for your help,
FaHui


Robert LeBlanc 於 2015/4/24 上午 10:08 寫道:

What hosts were those OSDS on? I'm concerned that two OSDS for some ofthe PGS were adjacent and if that placed them on the same host, itwould be contrary to your rules and something deeper is wrong.

Did you format the disks that were taken out of the cluster? Can youmount the partitions and see the files and directories? If so, you canprobably recover the data using the tools from the recovery/dev tools.

You may be able to force create the missing PGS using cephforce-create <pg.id <http://pg.id>>. This may or may not work, I don'tremember.

If you just don't care about losing data, you can delete the pool andcreate a new one. This should work for sure, but losses any data thatyou might have still had. If this pool was full of RBD, then there isa high possibility that all of your RBD images had chunks in themissing PGs. If you choose not to try to restore the PGS using thetools, I'd be inclined to delete the pool and restore from back up asto not be surprised by data corruption in the images. Neither optionis ideal or quick.


Robert LeBlanc

Sent from a mobile device please excuse any typos.

On Apr 23, 2015 6:42 PM, "FaHui Lin" <[email protected]<mailto:[email protected]>> wrote:


    Hi, thank you for your response.

    Well, I've not only taken out but also totally removed the both
    OSDs (by "ceph osd rm" and delete everything in
    /var/lib/ceph/osd/<related OSDs>) of that pg (and similar to all
    other stale pgs.)

    The main problem I have is those stale pgs (miss all OSDs I've
    removed) not merely make ceph health warning, but other machine
    cannot mount the ceph rbd as well.

    Here's the full crush map.  The OSDs I removed were osd.5~19.

        # begin crush map
        tunable choose_local_tries 0
        tunable choose_local_fallback_tries 0
        tunable choose_total_tries 500

        # devices
        device 0 osd.0
        device 1 device1
        device 2 osd.2
        device 3 osd.3
        device 4 osd.4
        device 5 device5
        device 6 device6
        device 7 device7
        device 8 device8
        device 9 device9
        device 10 device10
        device 11 device11
        device 12 device12
        device 13 device13
        device 14 device14
        device 15 device15
        device 16 device16
        device 17 device17
        device 18 device18
        device 19 device19
        device 20 osd.20
        device 21 osd.21
        device 22 osd.22
        device 23 osd.23
        device 24 osd.24
        device 25 osd.25
        device 26 osd.26
        device 27 osd.27

        # types
        type 0 osd
        type 1 host
        type 2 rack
        type 3 row
        type 4 room
        type 5 datacenter
        type 6 root

        # buckets
        host XX-ceph01 {
                id -2           # do not change unnecessarily
                # weight 160.040
                alg straw
                hash 0  # rjenkins1
                item osd.0 weight 40.010
                item osd.2 weight 40.010
                item osd.3 weight 40.010
                item osd.4 weight 40.010
        }
        host XX-ceph02 {
                id -3           # do not change unnecessarily
                # weight 320.160
                alg straw
                hash 0  # rjenkins1
                item osd.20 weight 40.020
                item osd.21 weight 40.020
                item osd.22 weight 40.020
                item osd.23 weight 40.020
                item osd.24 weight 40.020
                item osd.25 weight 40.020
                item osd.26 weight 40.020
                item osd.27 weight 40.020
        }
        root default {
                id -1           # do not change unnecessarily
                # weight 480.200
                alg straw
                hash 0  # rjenkins1
                item XX-ceph01 weight 160.040
                item XX-ceph02 weight 320.160
        }

        # rules
        rule data {
                ruleset 0
                type replicated
                min_size 1
                max_size 10
                step take default
                step chooseleaf firstn 0 type host
                step emit
        }
        rule metadata {
                ruleset 1
                type replicated
                min_size 1
                max_size 10
                step take default
                step chooseleaf firstn 0 type host
                step emit
        }
        rule rbd {
                ruleset 2
                type replicated
                min_size 1
                max_size 10
                step take default
                step chooseleaf firstn 0 type host
                step emit
        }

        # end crush map

    List of some stale pgs:

pg_stat objects mip degr misp unf bytes logdisklog state state_stamp v reported upup_primary acting acting_primary last_scrubscrub_stamp last_deep_scrub deep_scrub_stamp17.c6 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:09.358613 0'0

        2706:216        [19,13] 19      [19,13] 19 0'0     2015-04-16
        02:29:34.882038
              0'0     2015-04-16 02:29:34.882038

17.c7 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:28.304621 0'0

        2718:262        [15,18] 15      [15,18] 15 0'0     2015-04-20
        09:15:39.363310
              0'0     2015-04-20 09:15:39.363310

17.c1 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:01.073681 0'0

        2706:199        [19,16] 19      [19,16] 19 0'0     2015-04-15
        12:37:11.741251
              0'0     2015-04-15 12:37:11.741251

17.de <http://17.de> 0 0 0 0 00 0 0 stale+active+undersized+degraded2015-04-20 23:41:29.436796 0'0 2718:267 [15]

        15      [15]    15      0'0     2015-04-13
        07:56:01.760824      0'0     2015-04-13 07:56:01.760824
        17.da   0       0       0       0       0 0       0       0
        stale+active+undersized+degraded        2015-04-20

23:41:50.001087 0'0 2718:232 [14] 14[14] 14 0'0 2015-04-19 15:45:53.3045960'0 2015-04-19 15:45:53.304596

        17.d9   0       0       0       0       0 0       0       0
        stale+active+undersized+degraded        2015-04-20

23:41:29.472983 0'0 2718:270 [14] 14[14] 14 0'0 2015-04-16 01:55:44.1835500'0 2015-04-16 01:55:44.183550

        17.d7   0       0       0       0       0 0       0       0
        stale+active+undersized+degraded        2015-04-20

23:41:53.839134 0'0 2718:68 [17] 17 [17]17 0'0 2015-04-16 00:06:27.998210 0'02015-04-16 00:06:27.99821017.d5 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:28.311352 0'0

        2718:226        [18,17] 18      [18,17] 18 0'0     2015-04-15
        20:52:33.372369
              0'0     2015-04-15 20:52:33.372369

17.d0 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:24.850188 0'0

        2718:213        [15,12] 15      [15,12] 15 0'0     2015-04-19
        15:40:32.215234
              0'0     2015-04-19 15:40:32.215234

17.d1 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:24.849996 0'0

        2718:227        [15,12] 15      [15,12] 15 0'0     2015-04-15
        19:03:38.137147
              0'0     2015-04-15 19:03:38.137147

17.ae <http://17.ae> 0 0 0 0 00 0 0 stale+active+clean 2015-04-2009:16:28.310506 0'0 2718:231 [18,12] 18[18,12] 18 0'0 2015-04-16 02:23:35.031329

              0'0     2015-04-16 02:23:35.031329

17.ac <http://17.ac> 0 0 0 0 00 0 0 stale+active+undersized+degraded2015-04-20 23:41:50.002406 0'0 2718:66 [12] 12[12] 12 0'0 2015-04-16 02:23:33.0234760'0 2015-04-16 02:23:33.02347617.aa 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:25.983034 0'0

        2718:213        [15,14] 15      [15,14] 15 0'0     2015-04-19
        15:32:38.896039
              0'0     2015-04-19 15:32:38.896039

17.ab 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:24.836133 0'0

        2718:260        [12,17] 12      [12,17] 12 0'0     2015-04-19
        15:32:44.905707
              0'0     2015-04-19 15:32:44.905707

17.a8 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:09.361319 0'0

        2706:212        [19,13] 19      [19,13] 19 0'0     2015-04-16
        02:23:32.026015
              0'0     2015-04-16 02:23:32.026015
        17.a6   0       0       0       0       0 0       0       0
        stale+active+undersized+degraded        2015-04-20

23:41:50.002804 0'0 2718:96 [18] 18 [18]18 0'0 2015-04-20 14:02:29.334181 0'02015-04-20 14:02:29.33418117.a4 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:28.310707 0'0

        2718:232        [18,17] 18      [18,17] 18 0'0     2015-04-16
        02:22:12.018136
              0'0     2015-04-16 02:22:12.018136

17.a2 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:11.624952 0'0

        2718:200        [15,17] 15      [15,17] 15 0'0     2015-04-15
        10:42:37.880699
              0'0     2015-04-15 10:42:37.880699
        17.a0   0       0       0       0       0 0       0       0
        stale+active+undersized+degraded        2015-04-20

23:41:29.469600 0'0 2718:66 [18] 18 [18]18 0'0 2015-04-16 02:22:08.992748 0'02015-04-16 02:22:08.992748


    OSDs of those pgs (either primary or secondary) are totally gone,
    and I cannot find a way to repair them.

    I've had another machince of new drive partitions, and I tried to
    re-create OSDs I had removed on it, but that would be osd.28, 29,
    etc. That's why I wondered how to change ID number of an OSD.

    Regardless of the data loss (which I think it's already happened),
    I'd like to make the ceph service normal asap.
    Is there anyway to deal with those stale pgs? (such as to recreate
    the OSDs they need, or to inject exsisting OSDs to those pgs, or
    even to kill those pgs?)
    And since I'm not experienced, I may need more concrete comments
    (i.e. approach with ceph commands). Many thaks for your help.

    Best Regards,
    FaHui


    Robert LeBlanc 於 2015/4/23 下午 10:53 寫道:

    A full CRUSH dump would be helpful, as well as knowing which OSDs
    you took out. If you didn't take 17 out as well as 15, then you
    might be OK. If the OSDs still show up in your CRUSH, then try
    and remove them from the CRSH map with 'ceph osd crush rm osd.15'.

    If you took out both OSDs, you will need to use some of the
    recovery tools. I believe the procedure is roughly, mount the
    drive in another box, extract the PGs needed, then shut down the
    primary OSD for that PG, inject the PG into the OSD, then start
    it up and it should replicate. I haven't done it myself (probably
    something I should do in case I ever run into the problem).

    On Thu, Apr 23, 2015 at 2:00 AM, FaHui Lin <[email protected]
    <mailto:[email protected]>> wrote:

        Dear Ceph experts,

        I'm a very new Ceph user. I made a blunder that I removed
        some OSDs (and all files in the related directories) before
        Ceph finished rebalancing datas and migrating pgs.

        Not to mention the data loss, I meet the problem that:

        1) There are always stale pgs showing in ceph status (with
        heath warning). Say one of the stale pg 17.a2:

            # ceph -v
            ceph version *0.87.1*
            (283c2e7cfa2457799f534744d7d549f83ea1335e)

            # ceph -s
                cluster 3f81b47e-fb15-4fbb-9fee-0b1986dfd7ea
                 health HEALTH_WARN 203 pgs degraded; 366 pgs stale;
            203 pgs stuck degraded; *366 pgs stuck stale*; 203 pgs
            stuck unclean; 203 pgs stuck undersized; 203 pgs
            undersized; 154 requests are blocked > 32 sec; recovery
            153738/18991802 objects degraded (0.809%)
                 monmap e1: 1 mons at {...=...:6789/0}, election
            epoch 1, quorum 0 tw-ceph01
                 osdmap e3697: 12 osds: 12 up, 12 in
                  pgmap v21296531: 1156 pgs, 18 pools, 36929 GB data,
            9273 kobjects
                        72068 GB used, 409 TB / 480 TB avail
                        153738/18991802 objects degraded (0.809%)
                             163 stale+active+clean
                             786 active+clean
                             203 stale+active+undersized+degraded
                               4 active+clean+scrubbing+deep


            # ceph pg dump_stuck stale | grep 17.a2

17.a2 0 0 0 0 0 0 00 stale+active+clean 2015-04-20 09:16:11.6249520'0 2718:200 [15,17] 15 [15,17] 15 0'0

            2015-04-15 10:42:37.880699    0'0 2015-04-15 10:42:37.880699

            # ceph pg repair 17.a2
            Error EAGAIN: pg 17.a2 primary osd.15 not up

            # ceph pg scrub 17.a2
            Error EAGAIN: pg 17.a2 primary osd.15 not up

            # ceph pg map 17.a2
            osdmap e3695 pg 17.a2 (17.a2) -> up [27,3] acting [27,3]


        where osd.15 had already been removed. It seems to map to the
        existing OSDs ([27, 3]).
        Can this pg finally get recovered by changing to the existing
        OSDs? If not, how can I do about this kind of stale pg?

        2) I tried to solve the problem above by creating OSDs back
        but failed. The reason was I cannot create an OSD with the
        same ID to that I removed, say osd.15 (or change the id of an
        OSD).
        Is there any way to change the id of an OSD? (By the way, I'm
        suprised that this issue can hardly be found on the internet.)

        3) I tried another thing: to dump the crushmap and remove
        everything (including devices and buckets sections) related
        to the OSDs I removed. However, after I set the crushmap and
        dumped it out again, I found the OSDs's line still appear in
        the devices section (not in the buckets section though), such as:

            # devices
            device 0 osd.0
            device 2 osd.2
            device 3 osd.3
            device 4 osd.4
            *device 5 device5**
            **...**
            **device 14 device14**
            **device 15 device15*


        Is there anyway to remove them? Does it matters when I want
        to add new OSDs?

        Please inform me if you have any comments. Thank you.

        Best Regards,
        FaHui


        _______________________________________________
        ceph-users mailing list
        [email protected] <mailto:[email protected]>
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Accidentally Remove OSDs

Reply via email to