Faidon/paravoid's cluster has a bunch of OSDs that are up, but the pg
queries indicate they are tens of thousands of epochs behind:
"history": { "epoch_created": 14,
"last_epoch_started": 88174,
"last_epoch_clean": 88174,
"last_epoch_split": 0,
"same_up_since": 88172,
"same_interval_since": 88172,
"same_primary_since": 88172,
(where the current map epoch is 102000 or thereabouts).
I think just restarting all OSDs at once will get him caught up (esp with
a 'ceph osd set noup' block until they are done processing maps), but I
wonder if we may want an additional check that if any PG falls more than X
epochs behind the OSD marks it self down and catches up before coming
in...
What do you think?
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html