Incidentally I am having similar issues with other PG:

For instance:
pg 0.23 is stuck stale for 302497.994355, current state stale+active+clean,
last acting [5,2,4]


when I do:
# ceph pg 0.23 query

or
# ceph pg 5.5 query

It also freezes.  I can't seem to see anything unusual in the log files, or
in any display of OSDs.

On Sun, Jun 7, 2015 at 8:41 AM, Marek Dohojda <[email protected]>
wrote:

> I think this is the issue.  look at ceph health detail you will see that
> 0.21 and others are orphan:
> HEALTH_WARN 65 pgs stale; 22 pgs stuck inactive; 65 pgs stuck stale; 22
> pgs stuck unclean; too many PGs per OSD (456 > max 300)
> pg 0.21 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.34 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.33 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.1 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.1b is stuck inactive since forever, current state creating, last
> acting []
> pg 0.32 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.31 is stuck inactive since forever, current state creating, last
> acting []
> pg 2.0 is stuck inactive since forever, current state creating, last
> acting []
> pg 5.7 is stuck inactive since forever, current state creating, last
> acting []
> pg 1.0 is stuck inactive since forever, current state creating, last
> acting []
> pg 2.2 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.16 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.15 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.2b is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3f is stuck inactive since forever, current state creating, last
> acting []
> pg 0.27 is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3c is stuck inactive since forever, current state creating, last
> acting []
> pg 0.3a is stuck inactive since forever, current state creating, last
> acting []
> pg 0.21 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 5.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 1.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.34 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.33 is stuck unclean since forever, current state creating, last
> acting []
> pg 5.1 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.1b is stuck unclean since forever, current state creating, last
> acting []
> pg 0.32 is stuck unclean since forever, current state creating, last
> acting []
> pg 1.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.31 is stuck unclean since forever, current state creating, last
> acting []
> pg 2.0 is stuck unclean since forever, current state creating, last acting
> []
> pg 5.7 is stuck unclean since forever, current state creating, last acting
> []
> pg 1.0 is stuck unclean since forever, current state creating, last acting
> []
> pg 2.2 is stuck unclean since forever, current state creating, last acting
> []
> pg 0.16 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.15 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.2b is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3f is stuck unclean since forever, current state creating, last
> acting []
> pg 0.27 is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3c is stuck unclean since forever, current state creating, last
> acting []
> pg 0.3a is stuck unclean since forever, current state creating, last
> acting []
>
>
> On Sun, Jun 7, 2015 at 8:39 AM, Alex Muntada <[email protected]> wrote:
>
>> That happened also to us, but after moving the OSDs with blocked requests
>> out of the cluster it eventually regained health OK.
>>
>> Running ceph health details should list those OSDs. Do you have any?
>> El dia 07/06/2015 16:16, "Marek Dohojda" <[email protected]>
>> va escriure:
>>
>> Thank you.  Unfortunately this won't work because 0.21 is already being
>>> creating:
>>> ~# ceph pg force_create_pg 0.21
>>> pg 0.21 already creating
>>>
>>>
>>> I think, and I am guessing here since I don't know internals that well,
>>> that 0.21 started to be created but since its OSD disappear it never
>>> finished and it keeps trying.
>>>
>>> On Sun, Jun 7, 2015 at 12:18 AM, Alex Muntada <[email protected]> wrote:
>>>
>>>> Marek Dohojda:
>>>>
>>>> One of the Stuck Inactive is 0.21 and here is the output of ceph pg map
>>>>>
>>>>> #ceph pg map 0.21
>>>>> osdmap e579 pg 0.21 (0.21) -> up [] acting []
>>>>>
>>>>> #ceph pg dump_stuck stale
>>>>> ok
>>>>> pg_stat state   up      up_primary      acting  acting_primary
>>>>> 0.22    stale+active+clean      [5,1,6] 5       [5,1,6] 5
>>>>> 0.1f    stale+active+clean      [2,0,4] 2       [2,0,4] 2
>>>>> <reducted for ease of reading>
>>>>>
>>>>> # ceph osd stat
>>>>>      osdmap e579: 14 osds: 14 up, 14 in
>>>>>
>>>>> If I do
>>>>> #ceph pg 0.21 query
>>>>>
>>>>> The command freezes and never returns any output.
>>>>>
>>>>> I suspect that the problem is that these PGs were created but the OSD
>>>>> that they were initially created under disappeared.  So I believe that I
>>>>> should just remove these PGs, but honestly I don’t see how.
>>>>>
>>>>> Does anybody have any ideas as to what to do next?
>>>>>
>>>>
>>>> ceph pg force_create_pg 0.21
>>>>
>>>> We've been playing last week with this same scenario: we stopped on
>>>> purpose the 3 OSD with the replicas of one PG to find out how it affected
>>>> to the cluster and we ended up with a stale PG and 400 requests blocked for
>>>> a long time. After trying several commands to get the cluster back the one
>>>> that made the difference was force_create_pg and later moving the OSD with
>>>> blocked requests out of the cluster.
>>>>
>>>> Hope that helps,
>>>> Alex
>>>>
>>>
>>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to