On Fri, Jan 27, 2012 at 1:32 PM, Sage Weil <[email protected]> wrote:
> Please review.
>
> If the monitor sees an osdmap go by where nodes go down (or up) it will
> scan its pg_map and mark any pg whose primary is down as 'stale'.  If/when
> the pg recovers, that will get refreshed.  If not, the admin will know
> something is up.
Hmm. Without any kind of timeout this flag will get set every time an
OSD goes down — the replicas won't alert the new primary until after
they get the map marking their old primary down, and this check will
be run synchronously with the generation of the map marking the OSD
down.
The "spurious" stale marker on each PG isn't a big deal (it'll
disappear after a few seconds), but if we're going to set HEALTH_WARN
based on it, that seems like a bit much to me.

> We'll soon be adding the last_active, last_clean, and now last_unstale (?)
> fields so that bigger alarms can go off when the pg stays stale for more
> than a few seconds...
Yeah; I think we want to use this to trigger big warnings, but not to
trigger warnings without it.
-Greg


>
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to