Is it correct to say that the limited resource field is *only* meant to
provide machine readable information about what resources limits were
exceeded?

If so, does it make sense to provide richer reporting fields for all
failure reasons? I imagine other failure reasons could benefit from being
able to report details of the failure that are machine readable.

On Mon, Oct 9, 2017, 3:50 PM James Peach <jor...@gmail.com> wrote:

>
> > On Oct 9, 2017, at 1:27 PM, Vinod Kone <vinodk...@apache.org> wrote:
> >
> >> In the case that a task is killed because it violated a resource
> >> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION,
> >> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY),
> >> this field may be populated with the resource that triggered the
> >> limitation. This is intended to give better information to schedulers
> about
> >> task resource failures, in the expectation that it will help them bubble
> >> useful information up to the user or a monitoring system.
> >>
> >
> > Can you elaborate what schedulers are expected to do with this
> information?
> > Looking for some concrete use cases if you can.
>
> There's no concrete use case here; it's just a matter of propagating
> information we know in a structured way.
>
> If we assume that the scheduler knows about some sort of monitoring system
> or has a UI, we can present this to the user or a system that can take
> action on it. The status quo is that the raw message string is dumped to
> logs, and has to be manually interpreted.
>
> Additionally, this can pave the way to getting rid of
> REASON_CONTAINER_LIMITATION_DISK and REASON_CONTAINER_LIMITATION_MEMORY.
> All you really need is REASON_CONTAINER_LIMITATION plus the resource
> information.
>
> J
>
>

Reply via email to