Is it correct to say that the limited resource field is *only* meant to provide machine readable information about what resources limits were exceeded?
If so, does it make sense to provide richer reporting fields for all failure reasons? I imagine other failure reasons could benefit from being able to report details of the failure that are machine readable. On Mon, Oct 9, 2017, 3:50 PM James Peach <jor...@gmail.com> wrote: > > > On Oct 9, 2017, at 1:27 PM, Vinod Kone <vinodk...@apache.org> wrote: > > > >> In the case that a task is killed because it violated a resource > >> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION, > >> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY), > >> this field may be populated with the resource that triggered the > >> limitation. This is intended to give better information to schedulers > about > >> task resource failures, in the expectation that it will help them bubble > >> useful information up to the user or a monitoring system. > >> > > > > Can you elaborate what schedulers are expected to do with this > information? > > Looking for some concrete use cases if you can. > > There's no concrete use case here; it's just a matter of propagating > information we know in a structured way. > > If we assume that the scheduler knows about some sort of monitoring system > or has a UI, we can present this to the user or a system that can take > action on it. The status quo is that the raw message string is dumped to > logs, and has to be manually interpreted. > > Additionally, this can pave the way to getting rid of > REASON_CONTAINER_LIMITATION_DISK and REASON_CONTAINER_LIMITATION_MEMORY. > All you really need is REASON_CONTAINER_LIMITATION plus the resource > information. > > J > >