Incidentally I am having similar issues with other PG: For instance: pg 0.23 is stuck stale for 302497.994355, current state stale+active+clean, last acting [5,2,4]
when I do: # ceph pg 0.23 query or # ceph pg 5.5 query It also freezes. I can't seem to see anything unusual in the log files, or in any display of OSDs. On Sun, Jun 7, 2015 at 8:41 AM, Marek Dohojda <[email protected]> wrote: > I think this is the issue. look at ceph health detail you will see that > 0.21 and others are orphan: > HEALTH_WARN 65 pgs stale; 22 pgs stuck inactive; 65 pgs stuck stale; 22 > pgs stuck unclean; too many PGs per OSD (456 > max 300) > pg 0.21 is stuck inactive since forever, current state creating, last > acting [] > pg 0.7 is stuck inactive since forever, current state creating, last > acting [] > pg 5.2 is stuck inactive since forever, current state creating, last > acting [] > pg 1.7 is stuck inactive since forever, current state creating, last > acting [] > pg 0.34 is stuck inactive since forever, current state creating, last > acting [] > pg 0.33 is stuck inactive since forever, current state creating, last > acting [] > pg 5.1 is stuck inactive since forever, current state creating, last > acting [] > pg 0.1b is stuck inactive since forever, current state creating, last > acting [] > pg 0.32 is stuck inactive since forever, current state creating, last > acting [] > pg 1.2 is stuck inactive since forever, current state creating, last > acting [] > pg 0.31 is stuck inactive since forever, current state creating, last > acting [] > pg 2.0 is stuck inactive since forever, current state creating, last > acting [] > pg 5.7 is stuck inactive since forever, current state creating, last > acting [] > pg 1.0 is stuck inactive since forever, current state creating, last > acting [] > pg 2.2 is stuck inactive since forever, current state creating, last > acting [] > pg 0.16 is stuck inactive since forever, current state creating, last > acting [] > pg 0.15 is stuck inactive since forever, current state creating, last > acting [] > pg 0.2b is stuck inactive since forever, current state creating, last > acting [] > pg 0.3f is stuck inactive since forever, current state creating, last > acting [] > pg 0.27 is stuck inactive since forever, current state creating, last > acting [] > pg 0.3c is stuck inactive since forever, current state creating, last > acting [] > pg 0.3a is stuck inactive since forever, current state creating, last > acting [] > pg 0.21 is stuck unclean since forever, current state creating, last > acting [] > pg 0.7 is stuck unclean since forever, current state creating, last acting > [] > pg 5.2 is stuck unclean since forever, current state creating, last acting > [] > pg 1.7 is stuck unclean since forever, current state creating, last acting > [] > pg 0.34 is stuck unclean since forever, current state creating, last > acting [] > pg 0.33 is stuck unclean since forever, current state creating, last > acting [] > pg 5.1 is stuck unclean since forever, current state creating, last acting > [] > pg 0.1b is stuck unclean since forever, current state creating, last > acting [] > pg 0.32 is stuck unclean since forever, current state creating, last > acting [] > pg 1.2 is stuck unclean since forever, current state creating, last acting > [] > pg 0.31 is stuck unclean since forever, current state creating, last > acting [] > pg 2.0 is stuck unclean since forever, current state creating, last acting > [] > pg 5.7 is stuck unclean since forever, current state creating, last acting > [] > pg 1.0 is stuck unclean since forever, current state creating, last acting > [] > pg 2.2 is stuck unclean since forever, current state creating, last acting > [] > pg 0.16 is stuck unclean since forever, current state creating, last > acting [] > pg 0.15 is stuck unclean since forever, current state creating, last > acting [] > pg 0.2b is stuck unclean since forever, current state creating, last > acting [] > pg 0.3f is stuck unclean since forever, current state creating, last > acting [] > pg 0.27 is stuck unclean since forever, current state creating, last > acting [] > pg 0.3c is stuck unclean since forever, current state creating, last > acting [] > pg 0.3a is stuck unclean since forever, current state creating, last > acting [] > > > On Sun, Jun 7, 2015 at 8:39 AM, Alex Muntada <[email protected]> wrote: > >> That happened also to us, but after moving the OSDs with blocked requests >> out of the cluster it eventually regained health OK. >> >> Running ceph health details should list those OSDs. Do you have any? >> El dia 07/06/2015 16:16, "Marek Dohojda" <[email protected]> >> va escriure: >> >> Thank you. Unfortunately this won't work because 0.21 is already being >>> creating: >>> ~# ceph pg force_create_pg 0.21 >>> pg 0.21 already creating >>> >>> >>> I think, and I am guessing here since I don't know internals that well, >>> that 0.21 started to be created but since its OSD disappear it never >>> finished and it keeps trying. >>> >>> On Sun, Jun 7, 2015 at 12:18 AM, Alex Muntada <[email protected]> wrote: >>> >>>> Marek Dohojda: >>>> >>>> One of the Stuck Inactive is 0.21 and here is the output of ceph pg map >>>>> >>>>> #ceph pg map 0.21 >>>>> osdmap e579 pg 0.21 (0.21) -> up [] acting [] >>>>> >>>>> #ceph pg dump_stuck stale >>>>> ok >>>>> pg_stat state up up_primary acting acting_primary >>>>> 0.22 stale+active+clean [5,1,6] 5 [5,1,6] 5 >>>>> 0.1f stale+active+clean [2,0,4] 2 [2,0,4] 2 >>>>> <reducted for ease of reading> >>>>> >>>>> # ceph osd stat >>>>> osdmap e579: 14 osds: 14 up, 14 in >>>>> >>>>> If I do >>>>> #ceph pg 0.21 query >>>>> >>>>> The command freezes and never returns any output. >>>>> >>>>> I suspect that the problem is that these PGs were created but the OSD >>>>> that they were initially created under disappeared. So I believe that I >>>>> should just remove these PGs, but honestly I don’t see how. >>>>> >>>>> Does anybody have any ideas as to what to do next? >>>>> >>>> >>>> ceph pg force_create_pg 0.21 >>>> >>>> We've been playing last week with this same scenario: we stopped on >>>> purpose the 3 OSD with the replicas of one PG to find out how it affected >>>> to the cluster and we ended up with a stale PG and 400 requests blocked for >>>> a long time. After trying several commands to get the cluster back the one >>>> that made the difference was force_create_pg and later moving the OSD with >>>> blocked requests out of the cluster. >>>> >>>> Hope that helps, >>>> Alex >>>> >>> >>> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
