Re: [ceph-users] Accidentally Remove OSDs

Robert LeBlanc Thu, 23 Apr 2015 19:09:16 -0700

What hosts were those OSDS on? I'm concerned that two OSDS for some of the
PGS were adjacent and if that placed them on the same host, it would be
contrary to your rules and something deeper is wrong.


Did you format the disks that were taken out of the cluster? Can you mount
the partitions and see the files and directories? If so, you can probably
recover the data using the tools from the recovery/dev tools.

You may be able to force create the missing PGS using ceph force-create <
pg.id>. This may or may not work, I don't remember.

If you just don't care about losing data, you can delete the pool and
create a new one. This should work for sure, but losses any data that you
might have still had. If this pool was full of RBD, then there is a high
possibility that all of your RBD images had chunks in the missing PGs. If
you choose not to try to restore the PGS using the tools,  I'd be inclined
to delete the pool and restore from back up as to not be surprised by data
corruption in the images. Neither option is ideal or quick.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Apr 23, 2015 6:42 PM, "FaHui Lin" <[email protected]> wrote:

>  Hi, thank you for your response.
>
> Well, I've not only taken out but also totally removed the both OSDs (by
> "ceph osd rm" and delete everything in /var/lib/ceph/osd/<related OSDs>) of
> that pg (and similar to all other stale pgs.)
>
> The main problem I have is those stale pgs (miss all OSDs I've removed)
> not merely make ceph health warning, but other machine cannot mount the
> ceph rbd as well.
>
> Here's the full crush map.  The OSDs I removed were osd.5~19.
>
> # begin crush map
> tunable choose_local_tries 0
> tunable choose_local_fallback_tries 0
> tunable choose_total_tries 500
>
> # devices
> device 0 osd.0
> device 1 device1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
> device 5 device5
> device 6 device6
> device 7 device7
> device 8 device8
> device 9 device9
> device 10 device10
> device 11 device11
> device 12 device12
> device 13 device13
> device 14 device14
> device 15 device15
> device 16 device16
> device 17 device17
> device 18 device18
> device 19 device19
> device 20 osd.20
> device 21 osd.21
> device 22 osd.22
> device 23 osd.23
> device 24 osd.24
> device 25 osd.25
> device 26 osd.26
> device 27 osd.27
>
> # types
> type 0 osd
> type 1 host
> type 2 rack
> type 3 row
> type 4 room
> type 5 datacenter
> type 6 root
>
> # buckets
> host XX-ceph01 {
>         id -2           # do not change unnecessarily
>         # weight 160.040
>         alg straw
>         hash 0  # rjenkins1
>         item osd.0 weight 40.010
>         item osd.2 weight 40.010
>         item osd.3 weight 40.010
>         item osd.4 weight 40.010
> }
> host XX-ceph02 {
>         id -3           # do not change unnecessarily
>         # weight 320.160
>         alg straw
>         hash 0  # rjenkins1
>         item osd.20 weight 40.020
>         item osd.21 weight 40.020
>         item osd.22 weight 40.020
>         item osd.23 weight 40.020
>         item osd.24 weight 40.020
>         item osd.25 weight 40.020
>         item osd.26 weight 40.020
>         item osd.27 weight 40.020
> }
> root default {
>         id -1           # do not change unnecessarily
>         # weight 480.200
>         alg straw
>         hash 0  # rjenkins1
>         item XX-ceph01 weight 160.040
>         item XX-ceph02 weight 320.160
> }
>
> # rules
> rule data {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
> rule metadata {
>         ruleset 1
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
> rule rbd {
>         ruleset 2
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
>
> # end crush map
>
> List of some stale pgs:
>
> pg_stat objects mip     degr    misp    unf     bytes   log     disklog
> state   state_stamp     v       reported        up      up_primary
> acting  acting_primary  last_scrub      scrub_stamp     last_deep_scrub
> deep_scrub_stamp
> 17.c6   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:09.358613      0'0
> 2706:216        [19,13] 19      [19,13] 19      0'0     2015-04-16
> 02:29:34.882038
>       0'0     2015-04-16 02:29:34.882038
> 17.c7   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:28.304621      0'0
> 2718:262        [15,18] 15      [15,18] 15      0'0     2015-04-20
> 09:15:39.363310
>       0'0     2015-04-20 09:15:39.363310
> 17.c1   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:01.073681      0'0
> 2706:199        [19,16] 19      [19,16] 19      0'0     2015-04-15
> 12:37:11.741251
>       0'0     2015-04-15 12:37:11.741251
> 17.de   0       0       0       0       0       0       0       0
> stale+active+undersized+degraded        2015-04-20 23:41:29.436796
> 0'0     2718:267        [15]    15      [15]    15      0'0     2015-04-13
> 07:56:01.760824      0'0     2015-04-13 07:56:01.760824
> 17.da   0       0       0       0       0       0       0       0
> stale+active+undersized+degraded        2015-04-20 23:41:50.001087
> 0'0     2718:232        [14]    14      [14]    14      0'0     2015-04-19
> 15:45:53.304596      0'0     2015-04-19 15:45:53.304596
> 17.d9   0       0       0       0       0       0       0       0
> stale+active+undersized+degraded        2015-04-20 23:41:29.472983
> 0'0     2718:270        [14]    14      [14]    14      0'0     2015-04-16
> 01:55:44.183550      0'0     2015-04-16 01:55:44.183550
> 17.d7   0       0       0       0       0       0       0       0
> stale+active+undersized+degraded        2015-04-20 23:41:53.839134
> 0'0     2718:68 [17]    17      [17]    17      0'0     2015-04-16
> 00:06:27.998210      0'0     2015-04-16 00:06:27.998210
> 17.d5   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:28.311352      0'0
> 2718:226        [18,17] 18      [18,17] 18      0'0     2015-04-15
> 20:52:33.372369
>       0'0     2015-04-15 20:52:33.372369
> 17.d0   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:24.850188      0'0
> 2718:213        [15,12] 15      [15,12] 15      0'0     2015-04-19
> 15:40:32.215234
>       0'0     2015-04-19 15:40:32.215234
> 17.d1   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:24.849996      0'0
> 2718:227        [15,12] 15      [15,12] 15      0'0     2015-04-15
> 19:03:38.137147
>       0'0     2015-04-15 19:03:38.137147
> 17.ae   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:28.310506      0'0
> 2718:231        [18,12] 18      [18,12] 18      0'0     2015-04-16
> 02:23:35.031329
>       0'0     2015-04-16 02:23:35.031329
> 17.ac   0       0       0       0       0       0       0       0
> stale+active+undersized+degraded        2015-04-20 23:41:50.002406
> 0'0     2718:66 [12]    12      [12]    12      0'0     2015-04-16
> 02:23:33.023476      0'0     2015-04-16 02:23:33.023476
> 17.aa   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:25.983034      0'0
> 2718:213        [15,14] 15      [15,14] 15      0'0     2015-04-19
> 15:32:38.896039
>       0'0     2015-04-19 15:32:38.896039
> 17.ab   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:24.836133      0'0
> 2718:260        [12,17] 12      [12,17] 12      0'0     2015-04-19
> 15:32:44.905707
>       0'0     2015-04-19 15:32:44.905707
> 17.a8   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:09.361319      0'0
> 2706:212        [19,13] 19      [19,13] 19      0'0     2015-04-16
> 02:23:32.026015
>       0'0     2015-04-16 02:23:32.026015
> 17.a6   0       0       0       0       0       0       0       0
> stale+active+undersized+degraded        2015-04-20 23:41:50.002804
> 0'0     2718:96 [18]    18      [18]    18      0'0     2015-04-20
> 14:02:29.334181      0'0     2015-04-20 14:02:29.334181
> 17.a4   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:28.310707      0'0
> 2718:232        [18,17] 18      [18,17] 18      0'0     2015-04-16
> 02:22:12.018136
>       0'0     2015-04-16 02:22:12.018136
> 17.a2   0       0       0       0       0       0       0       0
> stale+active+clean      2015-04-20 09:16:11.624952      0'0
> 2718:200        [15,17] 15      [15,17] 15      0'0     2015-04-15
> 10:42:37.880699
>       0'0     2015-04-15 10:42:37.880699
> 17.a0   0       0       0       0       0       0       0       0
> stale+active+undersized+degraded        2015-04-20 23:41:29.469600
> 0'0     2718:66 [18]    18      [18]    18      0'0     2015-04-16
> 02:22:08.992748      0'0     2015-04-16 02:22:08.992748
>
> OSDs of those pgs (either primary or secondary) are totally gone, and I
> cannot find a way to repair them.
>
> I've had another machince of new drive partitions, and I tried to
> re-create OSDs I had removed on it, but that would be osd.28, 29, etc.
> That's why I wondered how to change ID number of an OSD.
>
> Regardless of the data loss (which I think it's already happened), I'd
> like to make the ceph service normal asap.
> Is there anyway to deal with those stale pgs? (such as to recreate the
> OSDs they need, or to inject exsisting OSDs to those pgs, or even to kill
> those pgs?)
> And since I'm not experienced, I may need more concrete comments (i.e.
> approach with ceph commands). Many thaks for your help.
>
> Best Regards,
> FaHui
>
>
> Robert LeBlanc 於 2015/4/23 下午 10:53 寫道:
>
> A full CRUSH dump would be helpful, as well as knowing which OSDs you took
> out. If you didn't take 17 out as well as 15, then you might be OK. If the
> OSDs still show up in your CRUSH, then try and remove them from the CRSH
> map with 'ceph osd crush rm osd.15'.
>
>  If you took out both OSDs, you will need to use some of the recovery
> tools. I believe the procedure is roughly, mount the drive in another box,
> extract the PGs needed, then shut down the primary OSD for that PG, inject
> the PG into the OSD, then start it up and it should replicate. I haven't
> done it myself (probably something I should do in case I ever run into the
> problem).
>
> On Thu, Apr 23, 2015 at 2:00 AM, FaHui Lin <[email protected]> wrote:
>
>>  Dear Ceph experts,
>>
>> I'm a very new Ceph user. I made a blunder that I removed some OSDs (and
>> all files in the related directories) before Ceph finished rebalancing
>> datas and migrating pgs.
>>
>> Not to mention the data loss, I meet the problem that:
>>
>> 1) There are always stale pgs showing in ceph status (with heath
>> warning). Say one of the stale pg 17.a2:
>>
>> # ceph -v
>> ceph version *0.87.1* (283c2e7cfa2457799f534744d7d549f83ea1335e)
>>
>> # ceph -s
>>     cluster 3f81b47e-fb15-4fbb-9fee-0b1986dfd7ea
>>      health HEALTH_WARN 203 pgs degraded; 366 pgs stale; 203 pgs stuck
>> degraded; *366 pgs stuck stale*; 203 pgs stuck unclean; 203 pgs stuck
>> undersized; 203 pgs undersized; 154 requests are blocked > 32 sec; recovery
>> 153738/18991802 objects degraded (0.809%)
>>      monmap e1: 1 mons at {...=...:6789/0}, election epoch 1, quorum 0
>> tw-ceph01
>>      osdmap e3697: 12 osds: 12 up, 12 in
>>       pgmap v21296531: 1156 pgs, 18 pools, 36929 GB data, 9273 kobjects
>>             72068 GB used, 409 TB / 480 TB avail
>>             153738/18991802 objects degraded (0.809%)
>>                  163 stale+active+clean
>>                  786 active+clean
>>                  203 stale+active+undersized+degraded
>>                    4 active+clean+scrubbing+deep
>>
>>
>> # ceph pg dump_stuck stale | grep 17.a2
>> 17.a2   0       0       0       0       0       0       0       0
>> stale+active+clean      2015-04-20 09:16:11.624952     0'0
>> 2718:200        [15,17] 15      [15,17] 15      0'0     2015-04-15
>> 10:42:37.880699    0'0      2015-04-15 10:42:37.880699
>>
>> # ceph pg repair 17.a2
>> Error EAGAIN: pg 17.a2 primary osd.15 not up
>>
>> # ceph pg scrub 17.a2
>> Error EAGAIN: pg 17.a2 primary osd.15 not up
>>
>> # ceph pg map 17.a2
>> osdmap e3695 pg 17.a2 (17.a2) -> up [27,3] acting [27,3]
>>
>>
>> where osd.15 had already been removed. It seems to map to the existing
>> OSDs ([27, 3]).
>> Can this pg finally get recovered by changing to the existing OSDs? If
>> not, how can I do about this kind of stale pg?
>>
>> 2) I tried to solve the problem above by creating OSDs back but failed.
>> The reason was I cannot create an OSD with the same ID to that I removed,
>> say osd.15 (or change the id of an OSD).
>> Is there any way to change the id of an OSD? (By the way, I'm suprised
>> that this issue can hardly be found on the internet.)
>>
>> 3) I tried another thing: to dump the crushmap and remove everything
>> (including devices and buckets sections) related to the OSDs I removed.
>> However, after I set the crushmap and dumped it out again, I found the
>> OSDs's line still appear in the devices section (not in the buckets section
>> though), such as:
>>
>> # devices
>> device 0 osd.0
>> device 2 osd.2
>> device 3 osd.3
>> device 4 osd.4
>> *device 5 device5*
>> *...*
>> *device 14 device14*
>> *device 15 device15*
>>
>>
>> Is there anyway to remove them? Does it matters when I want to add new
>> OSDs?
>>
>> Please inform me if you have any comments. Thank you.
>>
>> Best Regards,
>> FaHui
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Accidentally Remove OSDs

Reply via email to