Hi Wido!
On Wed, 9 Dec 2015, Wido den Hollander wrote:
> Hi,
>
> I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
> if >= X PGs are stuck non-active.
>
> This works for me now, but I would like to add a timer that a PG has to
> be inactive for more than Y seconds.
>
> The PGMap contains "last_active" and "last_clean", but these timestamps
> are never updated. So I can't query for last_active =< (now() - 300) for
> example.
>
> On a idle test cluster I have a PG for example:
>
> "last_active": "2015-12-09 02:32:31.540712",
>
> It's currently 08:53:56 here, so I can't check against last_active.
>
> What would a good way be to see for how long a PG has been inactive?
It sounds like maybe the current code is subtley broken:
https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566
The last_active/clean etc should be fresh within
osd_pg_stat_report_interval_max seconds...
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html