I just tracked down what caused the out-of-order reply:
- req 1 hit a degraded object, put on waiting_for_missing_object
- pg change, NullEvt queued
- req 2 dequeued, put on waiting_on_active list
- on_change()
- scrub_clear_state()
- requeues waiting_on_active
- requeues waiting_for_missing_object
...
Obvious fix for this case is to reoder the call to scrub_clear_state() in
on_change(), but I wonder if there are other cases where scrub's use of
waiting_for_active could break ordering.
It is probably worth mapping out what the wait lists order, where they
overlap, and carefully define what order they need to be woken up in. I
suspect that re-using waiting_for_active in this case is problematic...
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html