> Instead of resending the message to the pool immediately, it just waits
in the runbuffer, and the runbuffer is processed in reaction to any
potential change in resources: NeedWork, ContainerRemoved, etc. This may
add delay to any buffered message(s), but seems to avoid the catastrophic
crash in our systems.

This makes sense since the rescheduling is really an indication of
something going badly at the container level from my recollection. A better
solution might be to reschedule the request to another invoker for some
fairness criteria (but not easily guaranteed with the current architecture).

(This is testing my memory but...) we used to see these in a previous
incarnation of the scheduler as a precursor to docker daemon going out to
lunch. (Markus might remember better.)

-r

Reply via email to