Hi, thank you for your response.

Well, I've not only taken out but also totally removed the both OSDs (by "ceph osd rm" and delete everything in /var/lib/ceph/osd/<related OSDs>) of that pg (and similar to all other stale pgs.)

The main problem I have is those stale pgs (miss all OSDs I've removed) not merely make ceph health warning, but other machine cannot mount the ceph rbd as well.

Here's the full crush map.  The OSDs I removed were osd.5~19.

   # begin crush map
   tunable choose_local_tries 0
   tunable choose_local_fallback_tries 0
   tunable choose_total_tries 500

   # devices
   device 0 osd.0
   device 1 device1
   device 2 osd.2
   device 3 osd.3
   device 4 osd.4
   device 5 device5
   device 6 device6
   device 7 device7
   device 8 device8
   device 9 device9
   device 10 device10
   device 11 device11
   device 12 device12
   device 13 device13
   device 14 device14
   device 15 device15
   device 16 device16
   device 17 device17
   device 18 device18
   device 19 device19
   device 20 osd.20
   device 21 osd.21
   device 22 osd.22
   device 23 osd.23
   device 24 osd.24
   device 25 osd.25
   device 26 osd.26
   device 27 osd.27

   # types
   type 0 osd
   type 1 host
   type 2 rack
   type 3 row
   type 4 room
   type 5 datacenter
   type 6 root

   # buckets
   host XX-ceph01 {
            id -2           # do not change unnecessarily
            # weight 160.040
            alg straw
            hash 0  # rjenkins1
            item osd.0 weight 40.010
            item osd.2 weight 40.010
            item osd.3 weight 40.010
            item osd.4 weight 40.010
   }
   host XX-ceph02 {
            id -3           # do not change unnecessarily
            # weight 320.160
            alg straw
            hash 0  # rjenkins1
            item osd.20 weight 40.020
            item osd.21 weight 40.020
            item osd.22 weight 40.020
            item osd.23 weight 40.020
            item osd.24 weight 40.020
            item osd.25 weight 40.020
            item osd.26 weight 40.020
            item osd.27 weight 40.020
   }
   root default {
            id -1           # do not change unnecessarily
            # weight 480.200
            alg straw
            hash 0  # rjenkins1
            item XX-ceph01 weight 160.040
            item XX-ceph02 weight 320.160
   }

   # rules
   rule data {
            ruleset 0
            type replicated
            min_size 1
            max_size 10
            step take default
            step chooseleaf firstn 0 type host
            step emit
   }
   rule metadata {
            ruleset 1
            type replicated
            min_size 1
            max_size 10
            step take default
            step chooseleaf firstn 0 type host
            step emit
   }
   rule rbd {
            ruleset 2
            type replicated
            min_size 1
            max_size 10
            step take default
            step chooseleaf firstn 0 type host
            step emit
   }

   # end crush map

List of some stale pgs:

   pg_stat objects mip     degr    misp    unf     bytes   log disklog
state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 17.c6 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:09.358613 0'0 2706:216 [19,13] 19 [19,13] 19 0'0 2015-04-16
   02:29:34.882038
          0'0     2015-04-16 02:29:34.882038
17.c7 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:28.304621 0'0 2718:262 [15,18] 15 [15,18] 15 0'0 2015-04-20
   09:15:39.363310
          0'0     2015-04-20 09:15:39.363310
17.c1 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:01.073681 0'0 2706:199 [19,16] 19 [19,16] 19 0'0 2015-04-15
   12:37:11.741251
          0'0     2015-04-15 12:37:11.741251
17.de 0 0 0 0 0 0 0 0 stale+active+undersized+degraded 2015-04-20 23:41:29.436796 0'0 2718:267 [15] 15 [15] 15 0'0 2015-04-13
   07:56:01.760824      0'0     2015-04-13 07:56:01.760824
17.da 0 0 0 0 0 0 0 0 stale+active+undersized+degraded 2015-04-20 23:41:50.001087 0'0 2718:232 [14] 14 [14] 14 0'0 2015-04-19
   15:45:53.304596      0'0     2015-04-19 15:45:53.304596
17.d9 0 0 0 0 0 0 0 0 stale+active+undersized+degraded 2015-04-20 23:41:29.472983 0'0 2718:270 [14] 14 [14] 14 0'0 2015-04-16
   01:55:44.183550      0'0     2015-04-16 01:55:44.183550
17.d7 0 0 0 0 0 0 0 0 stale+active+undersized+degraded 2015-04-20 23:41:53.839134 0'0 2718:68 [17] 17 [17] 17 0'0 2015-04-16
   00:06:27.998210      0'0     2015-04-16 00:06:27.998210
17.d5 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:28.311352 0'0 2718:226 [18,17] 18 [18,17] 18 0'0 2015-04-15
   20:52:33.372369
          0'0     2015-04-15 20:52:33.372369
17.d0 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:24.850188 0'0 2718:213 [15,12] 15 [15,12] 15 0'0 2015-04-19
   15:40:32.215234
          0'0     2015-04-19 15:40:32.215234
17.d1 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:24.849996 0'0 2718:227 [15,12] 15 [15,12] 15 0'0 2015-04-15
   19:03:38.137147
          0'0     2015-04-15 19:03:38.137147
17.ae 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:28.310506 0'0 2718:231 [18,12] 18 [18,12] 18 0'0 2015-04-16
   02:23:35.031329
          0'0     2015-04-16 02:23:35.031329
17.ac 0 0 0 0 0 0 0 0 stale+active+undersized+degraded 2015-04-20 23:41:50.002406 0'0 2718:66 [12] 12 [12] 12 0'0 2015-04-16
   02:23:33.023476      0'0     2015-04-16 02:23:33.023476
17.aa 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:25.983034 0'0 2718:213 [15,14] 15 [15,14] 15 0'0 2015-04-19
   15:32:38.896039
          0'0     2015-04-19 15:32:38.896039
17.ab 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:24.836133 0'0 2718:260 [12,17] 12 [12,17] 12 0'0 2015-04-19
   15:32:44.905707
          0'0     2015-04-19 15:32:44.905707
17.a8 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:09.361319 0'0 2706:212 [19,13] 19 [19,13] 19 0'0 2015-04-16
   02:23:32.026015
          0'0     2015-04-16 02:23:32.026015
17.a6 0 0 0 0 0 0 0 0 stale+active+undersized+degraded 2015-04-20 23:41:50.002804 0'0 2718:96 [18] 18 [18] 18 0'0 2015-04-20
   14:02:29.334181      0'0     2015-04-20 14:02:29.334181
17.a4 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:28.310707 0'0 2718:232 [18,17] 18 [18,17] 18 0'0 2015-04-16
   02:22:12.018136
          0'0     2015-04-16 02:22:12.018136
17.a2 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:11.624952 0'0 2718:200 [15,17] 15 [15,17] 15 0'0 2015-04-15
   10:42:37.880699
          0'0     2015-04-15 10:42:37.880699
17.a0 0 0 0 0 0 0 0 0 stale+active+undersized+degraded 2015-04-20 23:41:29.469600 0'0 2718:66 [18] 18 [18] 18 0'0 2015-04-16
   02:22:08.992748      0'0     2015-04-16 02:22:08.992748

OSDs of those pgs (either primary or secondary) are totally gone, and I cannot find a way to repair them.

I've had another machince of new drive partitions, and I tried to re-create OSDs I had removed on it, but that would be osd.28, 29, etc. That's why I wondered how to change ID number of an OSD.

Regardless of the data loss (which I think it's already happened), I'd like to make the ceph service normal asap. Is there anyway to deal with those stale pgs? (such as to recreate the OSDs they need, or to inject exsisting OSDs to those pgs, or even to kill those pgs?) And since I'm not experienced, I may need more concrete comments (i.e. approach with ceph commands). Many thaks for your help.

Best Regards,
FaHui


Robert LeBlanc 於 2015/4/23 下午 10:53 寫道:
A full CRUSH dump would be helpful, as well as knowing which OSDs you took out. If you didn't take 17 out as well as 15, then you might be OK. If the OSDs still show up in your CRUSH, then try and remove them from the CRSH map with 'ceph osd crush rm osd.15'.

If you took out both OSDs, you will need to use some of the recovery tools. I believe the procedure is roughly, mount the drive in another box, extract the PGs needed, then shut down the primary OSD for that PG, inject the PG into the OSD, then start it up and it should replicate. I haven't done it myself (probably something I should do in case I ever run into the problem).

On Thu, Apr 23, 2015 at 2:00 AM, FaHui Lin <[email protected] <mailto:[email protected]>> wrote:

    Dear Ceph experts,

    I'm a very new Ceph user. I made a blunder that I removed some
    OSDs (and all files in the related directories) before Ceph
    finished rebalancing datas and migrating pgs.

    Not to mention the data loss, I meet the problem that:

    1) There are always stale pgs showing in ceph status (with heath
    warning). Say one of the stale pg 17.a2:

        # ceph -v
        ceph version *0.87.1* (283c2e7cfa2457799f534744d7d549f83ea1335e)

        # ceph -s
            cluster 3f81b47e-fb15-4fbb-9fee-0b1986dfd7ea
             health HEALTH_WARN 203 pgs degraded; 366 pgs stale; 203
        pgs stuck degraded; *366 pgs stuck stale*; 203 pgs stuck
        unclean; 203 pgs stuck undersized; 203 pgs undersized; 154
        requests are blocked > 32 sec; recovery 153738/18991802
        objects degraded (0.809%)
             monmap e1: 1 mons at {...=...:6789/0}, election epoch 1,
        quorum 0 tw-ceph01
             osdmap e3697: 12 osds: 12 up, 12 in
              pgmap v21296531: 1156 pgs, 18 pools, 36929 GB data, 9273
        kobjects
                    72068 GB used, 409 TB / 480 TB avail
                    153738/18991802 objects degraded (0.809%)
                         163 stale+active+clean
                         786 active+clean
                         203 stale+active+undersized+degraded
                           4 active+clean+scrubbing+deep


        # ceph pg dump_stuck stale | grep 17.a2
17.a2 0 0 0 0 0 0 0 0 stale+active+clean 2015-04-20 09:16:11.624952 0'0
        2718:200        [15,17] 15      [15,17] 15 0'0     2015-04-15
        10:42:37.880699    0'0 2015-04-15 10:42:37.880699

        # ceph pg repair 17.a2
        Error EAGAIN: pg 17.a2 primary osd.15 not up

        # ceph pg scrub 17.a2
        Error EAGAIN: pg 17.a2 primary osd.15 not up

        # ceph pg map 17.a2
        osdmap e3695 pg 17.a2 (17.a2) -> up [27,3] acting [27,3]


    where osd.15 had already been removed. It seems to map to the
    existing OSDs ([27, 3]).
    Can this pg finally get recovered by changing to the existing
    OSDs? If not, how can I do about this kind of stale pg?

    2) I tried to solve the problem above by creating OSDs back but
    failed. The reason was I cannot create an OSD with the same ID to
    that I removed, say osd.15 (or change the id of an OSD).
    Is there any way to change the id of an OSD? (By the way, I'm
    suprised that this issue can hardly be found on the internet.)

    3) I tried another thing: to dump the crushmap and remove
    everything (including devices and buckets sections) related to the
    OSDs I removed. However, after I set the crushmap and dumped it
    out again, I found the OSDs's line still appear in the devices
    section (not in the buckets section though), such as:

        # devices
        device 0 osd.0
        device 2 osd.2
        device 3 osd.3
        device 4 osd.4
        *device 5 device5**
        **...**
        **device 14 device14**
        **device 15 device15*


    Is there anyway to remove them? Does it matters when I want to add
    new OSDs?

    Please inform me if you have any comments. Thank you.

    Best Regards,
    FaHui


    _______________________________________________
    ceph-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to