[
https://issues.apache.org/jira/browse/AURORA-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475056#comment-15475056
]
Stephan Erb commented on AURORA-1766:
-------------------------------------
There seem to be a couple of options:
A) The user is completely oblivious to the existence of an executor overhead.
This is the state that is currently implemented. We don't charge executor
overhead as quota and we don't show it on the UI in any form. It is an
implementation detail that is only obvious to the cluster operator as he
configures the overhead and also provisions the cluster with resources.
B) Like Mesos, we require that a job has a minimum number of CPUs and RAM.
Everything below that threshold is rejected as we cannot guarantee it would
run. As Maxim indicated this may cause trouble if we plan to deploy an executor
with completely different resource requirements.
C) We keep the executor overhead but account for it on the UI as suggested in
this ticket. This would break the simplicity that a user can look at his quota
and easily calculate how many instances he will be able to start. He could try
to consider the executor overhead here manually, but he cannot really do that
as because it is only known to the cluster operator and he could change that
value at any time.
I feel like Option A is the cleanest approach as operators and user
responsibilities and knowledge is clearly separated. If we find that option
unacceptable, Option B sounds like the next best option due to its simplicity.
I don't see Option C as viable,as it seems to blur the responsibilities of
users and operators.
Suggestion: We keep Option A but reconsider our default values for the executor
overhead, i.e CPU to 0.1 and RAM to whatever the new executor egg requires
(should be much less than the previous one).
> Account for thermos executor overhead in reported CPU usage to user
> -------------------------------------------------------------------
>
> Key: AURORA-1766
> URL: https://issues.apache.org/jira/browse/AURORA-1766
> Project: Aurora
> Issue Type: Story
> Components: Scheduler
> Reporter: Rick Mangi
>
> The default .25 CPU overhead taken for thermos can be hugely impactful on
> clusters with many tasks and it's not reported to the user in the web UI.
> Ideally the resources used by the cluster should be reported accurately. We
> ran into a case where we could not launch jobs even though we thought we had
> plenty of free cores.
> Alternatively, the thermos overhead could be removed completely. I think it's
> confusing personally.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)