On 28/05/2015 17:41, Robert LeBlanc wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Let me see if I understand this... Your idea is to have a progress bar
that show (active+clean + active+scrub + active+deep-scrub) / pgs and
then estimate time remaining?
Not quite: it's not about doing a calculation on the global PG state
counts. The code identifies specific PGs affected by specific
operations, and then watches the status of those PGs.
So if PGs are split the numbers change and the progress bar go
backwards, is that a big deal?
I don't see a case where the progress bars go backwards with the code I
have so far? In the case of operations on PGs that split, it'll just
ignore the new PGs, but you'll get a separate event tracking the
creation of the new ones. In general, progress bars going backwards
isn't something we should allow to happen (happy to hear counter
examples though, I'm mainly speaking from intuition on that point!)
If this was extended to track operations across PG splits (it's unclear
to me that that complexity is worthwhile), then the bar still wouldn't
need to go backwards, as whatever stat was being tracked would remain
the same when summed across the newly split PGs.
I don't think so, it might take a
little time to recalculate how long it will take, but no big deal. I
do like the idea of the progress bar even if it is fuzzy. I keep
running ceph status or ceph -w to watch things and have to imagine it
in my mind.
Right, the idea is to save the admin from having to interpret PG counts
mentally.
It might be nice to have some other stats like client I/O
and rebuild I/O so that I can see if recovery is impacting production
I/O.
We already have some of these stats globally, but it would be nice to be
able to reason about what proportion of I/O is associated with specific
operations, e.g. "I have some total recovery IO number, what proportion
of that is due to a particular drive failure?". Without going and
looking at current pg stat structures I don't know if there is enough
data in the mon right now to guess those numbers. This would
*definitely* be heuristic rather than exact, in any case.
Cheers,
John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html