I think this happened because of the wrongly removed OSD...
A bug maybe ?

Do you think that "ceph pg repair" will force the remove of the PG from the 
missing osd ?
I am concerned about executing "pg repair" or "osd lost" because maybe it will 
decide that the stuck one is the right data and try to do stuff with it and 
discard the active running copy ......


Regards.

Dimitar Boichev
SysAdmin Team Lead
AXSMarine Sofia
Phone: +359 889 22 55 42
Skype: dimitar.boichev.axsmarine
E-mail: dimitar.boic...@axsmarine.com


-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Stillwell, Bryan
Sent: Tuesday, February 23, 2016 7:31 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] osd not removed from crush map after ceph osd crush 
remove

Dimitar,

I would agree with you that getting the cluster into a healthy state first is 
probably the better idea.  Based on your pg query, it appears like you're using 
only 1 replica.  Any ideas why that would be?

The output should look like this (with 3 replicas):

osdmap e133481 pg 11.1b8 (11.1b8) -> up [13,58,37] acting [13,58,37]

Bryan

From:  Dimitar Boichev <dimitar.boic...@axsmarine.com>
Date:  Tuesday, February 23, 2016 at 1:08 AM
To:  CTG User <bryan.stillw...@twcable.com>, "ceph-users@lists.ceph.com"
<ceph-users@lists.ceph.com>
Subject:  RE: [ceph-users] osd not removed from crush map after ceph osd crush 
remove


>Hello,
>Thank you Bryan.
>
>I was just trying to upgrade to hammer or upper but before that I was 
>wanting to get the cluster in Healthy state.
>Do you think it is safe to upgrade now first to latest firefly then to 
>Hammer ?
>
>
>Regards.
>
>Dimitar Boichev
>SysAdmin Team Lead
>AXSMarine Sofia
>Phone: +359 889 22 55 42
>Skype: dimitar.boichev.axsmarine
>E-mail:
>dimitar.boic...@axsmarine.com
>
>
>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com]
>On Behalf Of Stillwell, Bryan
>Sent: Tuesday, February 23, 2016 1:51 AM
>To: ceph-users@lists.ceph.com
>Subject: Re: [ceph-users] osd not removed from crush map after ceph osd 
>crush remove
>
>
>
>Dimitar,
>
>
>
>I'm not sure why those PGs would be stuck in the stale+active+clean 
>state.  Maybe try upgrading to the 0.80.11 release to see if it's a bug 
>that was fixed already?  You can use the 'ceph tell osd.*  version' 
>command after the upgrade to make sure all OSDs are running the new 
>version.  Also since firefly (0.80.x) is near its EOL, you should 
>consider upgrading to hammer (0.94.x).
>
>
>
>As for why osd.4 didn't get fully removed, the last command you ran 
>isn't correct.  It should be 'ceph osd rm 4'.  Trying to remember when 
>to use the CRUSH name (osd.4) versus the OSD number (4)  can be a pain.
>
>
>
>Bryan
>
>
>
>From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of 
>Dimitar Boichev <dimitar.boic...@axsmarine.com>
>Date: Monday, February 22, 2016 at 1:10 AM
>To: Dimitar Boichev <dimitar.boic...@axsmarine.com>, 
>"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
>Subject: Re: [ceph-users] osd not removed from crush map after ceph osd 
>crush remove
>
>
>
>>Anyone ?
>>
>>Regards.
>>
>>
>>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com]
>>On Behalf Of Dimitar Boichev
>>Sent: Thursday, February 18, 2016 5:06 PM
>>To: ceph-users@lists.ceph.com
>>Subject: [ceph-users] osd not removed from crush map after ceph osd 
>>crush remove
>>
>>
>>
>>Hello,
>>I am running a tiny cluster of 2 nodes.
>>ceph -v
>>ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>
>>One osd died and I added a new osd (not replacing the old one).
>>After that I wanted to remove the failed osd completely from the cluster.
>>Here is what I did:
>>ceph osd reweight osd.4 0.0
>>ceph osd crush reweight osd.4 0.0
>>ceph osd out osd.4
>>ceph osd crush remove osd.4
>>ceph auth del osd.4
>>ceph osd rm osd.4
>>
>>
>>But after the rebalancing I ended up with 155 PGs in 
>>stale+active+clean state.
>>
>>@storage1:/tmp# ceph -s
>>    cluster 7a9120b9-df42-4308-b7b1-e1f3d0f1e7b3
>>     health HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests 
>>are blocked > 32 sec; nodeep-scrub flag(s) set
>>     monmap e1: 1 mons at {storage1=192.168.10.3:6789/0}, election 
>>epoch 1, quorum 0 storage1
>>     osdmap e1064: 6 osds: 6 up, 6 in
>>            flags nodeep-scrub
>>      pgmap v26760322: 712 pgs, 8 pools, 532 GB data, 155 kobjects
>>            1209 GB used, 14210 GB / 15419 GB avail
>>                 155 stale+active+clean
>>                 557 active+clean
>>  client io 91925 B/s wr, 5 op/s
>>
>>I know about the 1 monitor problem I just want to fix the cluster to 
>>healthy state then I will add the third storage node and go up to 3 
>>monitors.
>>
>>The problem is as follows:
>>@storage1:/tmp# ceph pg map 2.3a
>>osdmap e1064 pg 2.3a (2.3a) -> up [6] acting [6] @storage1:/tmp# ceph 
>>pg 2.3a query Error ENOENT: i don't have pgid 2.3a
>>
>>
>>@storage1:/tmp# ceph health detail
>>HEALTH_WARN 155 pgs stale; 155 pgs stuck stale; 1 requests are blocked 
>>>
>>32 sec; 1 osds have slow requests; nodeep-scrub flag(s) set pg 7.2a is 
>>stuck stale for 8887559.656879, current state
>>stale+active+clean, last acting [4]
>>pg 5.28 is stuck stale for 8887559.656886, current state
>>stale+active+clean, last acting [4]
>>pg 7.2b is stuck stale for 8887559.656889, current state
>>stale+active+clean, last acting [4]
>>pg 7.2c is stuck stale for 8887559.656892, current state
>>stale+active+clean, last acting [4]
>>pg 0.2b is stuck stale for 8887559.656893, current state
>>stale+active+clean, last acting [4]
>>pg 6.2c is stuck stale for 8887559.656894, current state
>>stale+active+clean, last acting [4]
>>pg 6.2f is stuck stale for 8887559.656893, current state
>>stale+active+clean, last acting [4]
>>pg 2.2b is stuck stale for 8887559.656896, current state
>>stale+active+clean, last acting [4]
>>pg 2.25 is stuck stale for 8887559.656896, current state
>>stale+active+clean, last acting [4]
>>pg 6.20 is stuck stale for 8887559.656898, current state
>>stale+active+clean, last acting [4]
>>pg 5.21 is stuck stale for 8887559.656898, current state
>>stale+active+clean, last acting [4]
>>pg 0.24 is stuck stale for 8887559.656904, current state
>>stale+active+clean, last acting [4]
>>pg 2.21 is stuck stale for 8887559.656904, current state
>>stale+active+clean, last acting [4]
>>pg 5.27 is stuck stale for 8887559.656906, current state
>>stale+active+clean, last acting [4]
>>pg 2.23 is stuck stale for 8887559.656908, current state
>>stale+active+clean, last acting [4]
>>pg 6.26 is stuck stale for 8887559.656909, current state
>>stale+active+clean, last acting [4]
>>pg 7.27 is stuck stale for 8887559.656913, current state
>>stale+active+clean, last acting [4]
>>pg 7.18 is stuck stale for 8887559.656914, current state
>>stale+active+clean, last acting [4]
>>pg 0.1e is stuck stale for 8887559.656914, current state
>>stale+active+clean, last acting [4]
>>pg 6.18 is stuck stale for 8887559.656919, current state
>>stale+active+clean, last acting [4]
>>pg 2.1f is stuck stale for 8887559.656919, current state
>>stale+active+clean, last acting [4]
>>pg 7.1b is stuck stale for 8887559.656922, current state
>>stale+active+clean, last acting [4]
>>pg 0.1b is stuck stale for 8887559.656919, current state
>>stale+active+clean, last acting [4]
>>pg 6.1d is stuck stale for 8887559.656925, current state
>>stale+active+clean, last acting [4]
>>pg 2.18 is stuck stale for 8887559.656920, current state
>>stale+active+clean, last acting [4]
>>pg 7.1d is stuck stale for 8887559.656926, current state
>>stale+active+clean, last acting [4]
>>pg 5.1c is stuck stale for 8887559.656921, current state
>>stale+active+clean, last acting [4]
>>pg 5.1d is stuck stale for 8887559.656920, current state
>>stale+active+clean, last acting [4]
>>pg 6.11 is stuck stale for 8887559.656922, current state
>>stale+active+clean, last acting [4]
>>pg 5.13 is stuck stale for 8887559.656919, current state
>>stale+active+clean, last acting [4]
>>pg 0.16 is stuck stale for 8887559.656924, current state
>>stale+active+clean, last acting [4]
>>pg 6.10 is stuck stale for 8887559.656928, current state
>>stale+active+clean, last acting [4]
>>pg 2.17 is stuck stale for 8887559.656927, current state
>>stale+active+clean, last acting [4]
>>pg 7.12 is stuck stale for 8887559.656932, current state
>>stale+active+clean, last acting [4]
>>pg 0.12 is stuck stale for 8887559.656929, current state
>>stale+active+clean, last acting [4]
>>pg 6.14 is stuck stale for 8887559.656935, current state
>>stale+active+clean, last acting [4]
>>pg 0.11 is stuck stale for 8887559.656932, current state
>>stale+active+clean, last acting [4]
>>pg 7.16 is stuck stale for 8887559.656936, current state
>>stale+active+clean, last acting [4]
>>pg 0.10 is stuck stale for 8887559.656936, current state
>>stale+active+clean, last acting [4]
>>pg 2.d is stuck stale for 8887559.656933, current state
>>stale+active+clean, last acting [4]
>>pg 6.9 is stuck stale for 8887559.656939, current state
>>stale+active+clean, last acting [4]
>>pg 7.9 is stuck stale for 8887559.656939, current state
>>stale+active+clean, last acting [4]
>>pg 0.d is stuck stale for 8887559.656940, current state
>>stale+active+clean, last acting [4]
>>pg 7.a is stuck stale for 8887559.656944, current state
>>stale+active+clean, last acting [4]
>>pg 0.c is stuck stale for 8887559.656941, current state
>>stale+active+clean, last acting [4]
>>pg 2.e is stuck stale for 8887559.656947, current state
>>stale+active+clean, last acting [4]
>>pg 6.a is stuck stale for 8887559.656953, current state
>>stale+active+clean, last acting [4]
>>pg 0.b is stuck stale for 8887559.656949, current state
>>stale+active+clean, last acting [4]
>>pg 2.9 is stuck stale for 8887559.656954, current state
>>stale+active+clean, last acting [4]
>>pg 5.f is stuck stale for 8887559.656953, current state
>>stale+active+clean, last acting [4]
>>pg 7.d is stuck stale for 8887559.656958, current state
>>stale+active+clean, last acting [4]
>>pg 6.f is stuck stale for 8887559.656957, current state
>>stale+active+clean, last acting [4]
>>pg 3.4 is stuck stale for 8887559.656957, current state
>>stale+active+clean, last acting [4]
>>pg 5.3 is stuck stale for 8887559.656956, current state
>>stale+active+clean, last acting [4]
>>pg 2.4 is stuck stale for 8887559.656961, current state
>>stale+active+clean, last acting [4]
>>pg 6.0 is stuck stale for 8887559.656966, current state
>>stale+active+clean, last acting [4]
>>pg 3.6 is stuck stale for 8887559.656965, current state
>>stale+active+clean, last acting [4]
>>pg 3.7 is stuck stale for 8887559.656964, current state
>>stale+active+clean, last acting [4]
>>pg 2.6 is stuck stale for 8887559.656970, current state
>>stale+active+clean, last acting [4]
>>pg 0.3 is stuck stale for 8887559.656965, current state
>>stale+active+clean, last acting [4]
>>pg 5.6 is stuck stale for 8887559.656970, current state
>>stale+active+clean, last acting [4]
>>pg 7.4 is stuck stale for 8887559.656975, current state
>>stale+active+clean, last acting [4]
>>pg 3.1 is stuck stale for 8887559.656970, current state
>>stale+active+clean, last acting [4]
>>pg 6.4 is stuck stale for 8887559.656975, current state
>>stale+active+clean, last acting [4]
>>pg 5.4 is stuck stale for 8887559.656972, current state
>>stale+active+clean, last acting [4]
>>pg 2.3 is stuck stale for 8887559.656977, current state
>>stale+active+clean, last acting [4]
>>pg 5.5 is stuck stale for 8887559.656977, current state
>>stale+active+clean, last acting [4]
>>pg 3.3 is stuck stale for 8887559.656982, current state
>>stale+active+clean, last acting [4]
>>pg 5.7a is stuck stale for 8887559.657309, current state
>>stale+active+clean, last acting [4]
>>pg 6.78 is stuck stale for 8887559.657308, current state
>>stale+active+clean, last acting [4]
>>pg 5.78 is stuck stale for 8887559.657311, current state
>>stale+active+clean, last acting [4]
>>pg 5.79 is stuck stale for 8887559.657311, current state
>>stale+active+clean, last acting [4]
>>pg 6.7c is stuck stale for 8887559.657313, current state
>>stale+active+clean, last acting [4]
>>pg 7.7e is stuck stale for 8887559.657312, current state
>>stale+active+clean, last acting [4]
>>pg 6.7e is stuck stale for 8887559.657315, current state
>>stale+active+clean, last acting [4]
>>pg 7.70 is stuck stale for 8887559.657316, current state
>>stale+active+clean, last acting [4]
>>pg 6.73 is stuck stale for 8887559.657316, current state
>>stale+active+clean, last acting [4]
>>pg 5.77 is stuck stale for 8887559.657317, current state
>>stale+active+clean, last acting [4]
>>pg 5.74 is stuck stale for 8887559.657319, current state
>>stale+active+clean, last acting [4]
>>pg 5.75 is stuck stale for 8887559.657321, current state
>>stale+active+clean, last acting [4]
>>pg 7.68 is stuck stale for 8887559.657322, current state
>>stale+active+clean, last acting [4]
>>pg 6.68 is stuck stale for 8887559.657324, current state
>>stale+active+clean, last acting [4]
>>pg 7.6b is stuck stale for 8887559.657326, current state
>>stale+active+clean, last acting [4]
>>pg 6.6d is stuck stale for 8887559.657328, current state
>>stale+active+clean, last acting [4]
>>pg 5.6e is stuck stale for 8887559.657330, current state
>>stale+active+clean, last acting [4]
>>pg 6.6c is stuck stale for 8887559.657330, current state
>>stale+active+clean, last acting [4]
>>pg 7.6f is stuck stale for 8887559.657331, current state
>>stale+active+clean, last acting [4]
>>pg 7.60 is stuck stale for 8887559.657333, current state
>>stale+active+clean, last acting [4]
>>pg 6.60 is stuck stale for 8887559.657333, current state
>>stale+active+clean, last acting [4]
>>pg 7.62 is stuck stale for 8887559.657334, current state
>>stale+active+clean, last acting [4]
>>pg 6.65 is stuck stale for 8887559.657334, current state
>>stale+active+clean, last acting [4]
>>pg 7.64 is stuck stale for 8887559.657339, current state
>>stale+active+clean, last acting [4]
>>pg 5.67 is stuck stale for 8887559.657338, current state
>>stale+active+clean, last acting [4]
>>pg 7.66 is stuck stale for 8887559.657340, current state
>>stale+active+clean, last acting [4]
>>pg 6.66 is stuck stale for 8887559.657340, current state
>>stale+active+clean, last acting [4]
>>pg 7.67 is stuck stale for 8887559.657345, current state
>>stale+active+clean, last acting [4]
>>pg 6.59 is stuck stale for 8887559.657344, current state
>>stale+active+clean, last acting [4]
>>pg 7.58 is stuck stale for 8887559.657348, current state
>>stale+active+clean, last acting [4]
>>pg 6.58 is stuck stale for 8887559.657348, current state
>>stale+active+clean, last acting [4]
>>pg 7.59 is stuck stale for 8887559.657352, current state
>>stale+active+clean, last acting [4]
>>pg 6.5b is stuck stale for 8887559.657353, current state
>>stale+active+clean, last acting [4]
>>pg 5.59 is stuck stale for 8887559.657348, current state
>>stale+active+clean, last acting [4]
>>pg 6.5a is stuck stale for 8887559.657356, current state
>>stale+active+clean, last acting [4]
>>pg 5.5e is stuck stale for 8887559.657352, current state
>>stale+active+clean, last acting [4]
>>pg 6.5d is stuck stale for 8887559.657358, current state
>>stale+active+clean, last acting [4]
>>pg 6.5f is stuck stale for 8887559.657356, current state
>>stale+active+clean, last acting [4]
>>pg 7.51 is stuck stale for 8887559.657356, current state
>>stale+active+clean, last acting [4]
>>pg 7.52 is stuck stale for 8887559.657356, current state
>>stale+active+clean, last acting [4]
>>pg 7.53 is stuck stale for 8887559.657358, current state
>>stale+active+clean, last acting [4]
>>pg 6.55 is stuck stale for 8887559.657359, current state
>>stale+active+clean, last acting [4]
>>pg 7.54 is stuck stale for 8887559.657364, current state
>>stale+active+clean, last acting [4]
>>pg 6.54 is stuck stale for 8887559.657364, current state
>>stale+active+clean, last acting [4]
>>pg 6.57 is stuck stale for 8887559.657365, current state
>>stale+active+clean, last acting [4]
>>pg 7.56 is stuck stale for 8887559.657369, current state
>>stale+active+clean, last acting [4]
>>pg 5.55 is stuck stale for 8887559.657371, current state
>>stale+active+clean, last acting [4]
>>pg 7.48 is stuck stale for 8887559.657372, current state
>>stale+active+clean, last acting [4]
>>pg 6.49 is stuck stale for 8887559.657375, current state
>>stale+active+clean, last acting [4]
>>pg 5.4a is stuck stale for 8887559.657376, current state
>>stale+active+clean, last acting [4]
>>pg 6.48 is stuck stale for 8887559.657379, current state
>>stale+active+clean, last acting [4]
>>pg 7.4a is stuck stale for 8887559.657380, current state
>>stale+active+clean, last acting [4]
>>pg 6.4a is stuck stale for 8887559.657383, current state
>>stale+active+clean, last acting [4]
>>pg 6.4d is stuck stale for 8887559.657385, current state
>>stale+active+clean, last acting [4]
>>pg 7.4d is stuck stale for 8887559.657387, current state
>>stale+active+clean, last acting [4]
>>pg 6.4c is stuck stale for 8887559.657389, current state
>>stale+active+clean, last acting [4]
>>pg 6.4e is stuck stale for 8887559.657391, current state
>>stale+active+clean, last acting [4]
>>pg 5.42 is stuck stale for 8887559.657391, current state
>>stale+active+clean, last acting [4]
>>pg 6.43 is stuck stale for 8887559.657393, current state
>>stale+active+clean, last acting [4]
>>pg 5.41 is stuck stale for 8887559.657393, current state
>>stale+active+clean, last acting [4]
>>pg 5.47 is stuck stale for 8887559.657394, current state
>>stale+active+clean, last acting [4]
>>pg 7.46 is stuck stale for 8887559.657396, current state
>>stale+active+clean, last acting [4]
>>pg 6.39 is stuck stale for 8887559.657398, current state
>>stale+active+clean, last acting [4]
>>pg 5.3a is stuck stale for 8887559.657399, current state
>>stale+active+clean, last acting [4]
>>pg 2.3e is stuck stale for 8887559.657399, current state
>>stale+active+clean, last acting [4]
>>pg 0.3c is stuck stale for 8887559.657402, current state
>>stale+active+clean, last acting [4]
>>pg 7.3c is stuck stale for 8887559.657404, current state
>>stale+active+clean, last acting [4]
>>pg 7.3d is stuck stale for 8887559.657405, current state
>>stale+active+clean, last acting [4]
>>pg 0.39 is stuck stale for 8887559.657402, current state
>>stale+active+clean, last acting [4]
>>pg 5.3c is stuck stale for 8887559.657405, current state
>>stale+active+clean, last acting [4]
>>pg 2.3a is stuck stale for 8887559.657406, current state
>>stale+active+clean, last acting [4]
>>pg 0.38 is stuck stale for 8887559.657409, current state
>>stale+active+clean, last acting [4]
>>pg 2.35 is stuck stale for 8887559.657411, current state
>>stale+active+clean, last acting [4]
>>pg 0.37 is stuck stale for 8887559.657412, current state
>>stale+active+clean, last acting [4]
>>pg 5.32 is stuck stale for 8887559.657413, current state
>>stale+active+clean, last acting [4]
>>pg 2.34 is stuck stale for 8887559.657416, current state
>>stale+active+clean, last acting [4]
>>pg 0.36 is stuck stale for 8887559.657416, current state
>>stale+active+clean, last acting [4]
>>pg 7.32 is stuck stale for 8887559.657419, current state
>>stale+active+clean, last acting [4]
>>pg 6.33 is stuck stale for 8887559.657420, current state
>>stale+active+clean, last acting [4]
>>pg 0.35 is stuck stale for 8887559.657423, current state
>>stale+active+clean, last acting [4]
>>pg 6.35 is stuck stale for 8887559.657423, current state
>>stale+active+clean, last acting [4]
>>pg 5.36 is stuck stale for 8887559.657424, current state
>>stale+active+clean, last acting [4]
>>pg 2.30 is stuck stale for 8887559.657427, current state
>>stale+active+clean, last acting [4]
>>pg 5.37 is stuck stale for 8887559.657429, current state
>>stale+active+clean, last acting [4]
>>pg 7.36 is stuck stale for 8887559.657430, current state
>>stale+active+clean, last acting [4]
>>pg 6.37 is stuck stale for 8887559.657432, current state
>>stale+active+clean, last acting [4]
>>pg 6.28 is stuck stale for 8887559.657427, current state
>>stale+active+clean, last acting [4]
>>
>>
>>This stays that way and I think this is because when I downloaded and 
>>decompiled the crush map I discovered this:
>>@storage1:/tmp# crushtool -d /tmp/crushmap # begin crush map tunable 
>>choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable 
>>choose_total_tries 50 tunable chooseleaf_descend_once 1
>>
>># devices
>>device 0 osd.0
>>device 1 osd.1
>>device 2 osd.2
>>device 3 osd.3
>>device 4 device4
>>device 5 osd.5
>>device 6 osd.6
>>
>>
>>
>>Is there a way to remove this device 4 aka osd.4 from here so ceph can 
>>make another copy from the other location shown in ³ceph pg map 2.3a²  ?
>>
>>Regards.
>>
>>Dimitar Boichev
>>SysAdmin Team Lead
>>AXSMarine Sofia
>>Phone: +359 889 22 55 42
>>Skype: dimitar.boichev.axsmarine
>>E-mail:
>>dimitar.boic...@axsmarine.com
>>
>>


________________________________

This E-mail and any of its attachments may contain Time Warner Cable 
proprietary information, which is privileged, confidential, or subject to 
copyright belonging to Time Warner Cable. This E-mail is intended solely for 
the use of the individual or entity to which it is addressed. If you are not 
the intended recipient of this E-mail, you are hereby notified that any 
dissemination, distribution, copying, or action taken in relation to the 
contents of and attachments to this E-mail is strictly prohibited and may be 
unlawful. If you have received this E-mail in error, please notify the sender 
immediately and permanently delete the original and any copy of this E-mail and 
any printout.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to