[
https://issues.apache.org/jira/browse/AMBARI-23831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sandor Molnar updated AMBARI-23831:
-----------------------------------
Description: Created by accident; sorry for this. (was: The following
changes should be implemented:
1) For both secure and non-secure cluster.
- Use LinuxContainerExecutor:
{code:java}
"yarn.nodemanager.container-executor.class" =>
"org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor"
"yarn.nodemanager.linux-container-executor.resources-handler.class" =>
"org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler"
"yarn.nodemanager.linux-container-executor.cgroups.mount" => true (assume admin
won't mount cgroup ahead)
{code}
- Properly setup permission of container-executor / container-executor.cfg
(use today's permissions in security mode).
- Further changes:
{code:java}
"yarn.nodemanager.resource.memory.enabled"
// the default value is false, we need to set to true here to enable the
cgroups based memory monitoring.
"yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage"
// the default value is 90.0f, which means in memory congestion case, the
container can still keep/reserve 90% resource for its claimed value. It cannot
be set to above 100 or set as negative value.
"yarn.nodemanager.resource.memory.cgroups.swappiness"
// The percentage that memory can be swapped or not. default value is 0, which
means container memory cannot be swapped out. If not set, linux cgroup setting
by default set to 60 which means 60% of memory can potentially be swapped out
when system memory is not enough.
"yarn.nodemanager.linux-container-executor.group" set to Unix group of the
NodeManager which should match the setting in “container-executor.cfg” (hadoop
for ambari?).
{code}
- For cgroups limitations:
{code:java}
"yarn.nodemanager.resource.percentage-physical-cpu-limit" -
this setting lets you limit the cpu usage of all YARN containers. It sets a
hard upper limit on the cumulative CPU usage of the containers. For example, if
set to 60, the combined CPU usage of all YARN containers will not exceed 60%.
The yarn by default value is 100.
"yarn.nodemanager.resource.cpu-vcores" - number of vcores can be assign to yarn
containers, default value is 8 for yarn, but ambari should set a proper value
in considering of NM size, etc.
"yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" -
CGroups allows cpu usage limits to be hard or soft. When this setting is true,
containers cannot use more CPU usage than allocated even if spare CPU is
available. This ensures that containers can only use CPU that they were
allocated. When set to false, containers can use spare CPU if available. It
should be noted that irrespective of whether set to true or false, at no time
can the combined CPU usage of all containers exceed the value specified in
“yarn.nodemanager.resource.percentage-physical-cpu-limit”.
Talked with peers, we run into kernel panic when set hard limit before, so we
should know there is risk to set this to true. May need a documentation?
{code}
2) For non-secure cluster (this needs to be done when we move from secure to
non-secure):
- In container-executor.cfg: Remove "yarn" from banned user
({{banned.users}}). And set {{min.user.id}} to 50.
- In yarn-site.xml: change:
{code:java}
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=true
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=yarn
{code}
3) When moving from non-secure to secure:
- In container-executor.cfg:
Add "yarn" user to banned user ({{banned.users}})
Set {{min.user.id}} to existing default in Ambari (IIRC it's 1000).
- Revert yarn-site.xml following configs to:
{code:java}
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=false
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=nobody
{code}
)
> Ambari YARN Changes needed to enable CGroups + CPU Scheduling +
> LinuxContainerExecutor in both secure & Unsecure clusters
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: AMBARI-23831
> URL: https://issues.apache.org/jira/browse/AMBARI-23831
> Project: Ambari
> Issue Type: Task
> Components: ambari-server
> Reporter: Sandor Molnar
> Assignee: Sandor Molnar
> Priority: Blocker
> Fix For: 2.7.0
>
>
> Created by accident; sorry for this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)