[ 
https://issues.apache.org/jira/browse/AMBARI-23831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandor Molnar resolved AMBARI-23831.
------------------------------------
    Resolution: Won't Do

> Ambari YARN Changes needed to enable CGroups + CPU Scheduling + 
> LinuxContainerExecutor in both secure & Unsecure clusters
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-23831
>                 URL: https://issues.apache.org/jira/browse/AMBARI-23831
>             Project: Ambari
>          Issue Type: Task
>          Components: ambari-server
>            Reporter: Sandor Molnar
>            Assignee: Sandor Molnar
>            Priority: Blocker
>             Fix For: 2.7.0
>
>
> The following changes should be implemented:
> 1) For both secure and non-secure cluster.
>  - Use LinuxContainerExecutor:
> {code:java}
> "yarn.nodemanager.container-executor.class" => 
> "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor"
> "yarn.nodemanager.linux-container-executor.resources-handler.class" => 
> "org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler"
> "yarn.nodemanager.linux-container-executor.cgroups.mount" => true (assume 
> admin won't mount cgroup ahead)
> {code}
>  - Properly setup permission of container-executor / container-executor.cfg 
> (use today's permissions in security mode).
>  - Further changes:
> {code:java}
> "yarn.nodemanager.resource.memory.enabled"
> // the default value is false, we need to set to true here to enable the 
> cgroups based memory monitoring.
> "yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage"
> // the default value is 90.0f, which means in memory congestion case, the 
> container can still keep/reserve 90% resource for its claimed value. It 
> cannot be set to above 100 or set as negative value.
> "yarn.nodemanager.resource.memory.cgroups.swappiness"
> // The percentage that memory can be swapped or not. default value is 0, 
> which means container memory cannot be swapped out. If not set, linux cgroup 
> setting by default set to 60 which means 60% of memory can potentially be 
> swapped out when system memory is not enough.
> "yarn.nodemanager.linux-container-executor.group" set to Unix group of the 
> NodeManager which should match the setting in “container-executor.cfg” 
> (hadoop for ambari?).
> {code}
>  - For cgroups limitations:
> {code:java}
> "yarn.nodemanager.resource.percentage-physical-cpu-limit" - 
> this setting lets you limit the cpu usage of all YARN containers. It sets a 
> hard upper limit on the cumulative CPU usage of the containers. For example, 
> if set to 60, the combined CPU usage of all YARN containers will not exceed 
> 60%. The yarn by default value is 100.
> "yarn.nodemanager.resource.cpu-vcores" - number of vcores can be assign to 
> yarn containers, default value is 8 for yarn, but ambari should set a proper 
> value in considering of NM size, etc.
> "yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" - 
> CGroups allows cpu usage limits to be hard or soft. When this setting is 
> true, containers cannot use more CPU usage than allocated even if spare CPU 
> is available. This ensures that containers can only use CPU that they were 
> allocated. When set to false, containers can use spare CPU if available. It 
> should be noted that irrespective of whether set to true or false, at no time 
> can the combined CPU usage of all containers exceed the value specified in 
> “yarn.nodemanager.resource.percentage-physical-cpu-limit”.
> Talked with peers, we run into kernel panic when set hard limit before, so we 
> should know there is risk to set this to true. May need a documentation? 
> {code}
> 2) For non-secure cluster (this needs to be done when we move from secure to 
> non-secure):
>  - In container-executor.cfg: Remove "yarn" from banned user 
> ({{banned.users}}). And set {{min.user.id}} to 50.
>  - In yarn-site.xml: change:
> {code:java}
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=true
> yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=yarn
> {code}
> 3) When moving from non-secure to secure:
>  - In container-executor.cfg:
>  Add "yarn" user to banned user ({{banned.users}})
>  Set {{min.user.id}} to existing default in Ambari (IIRC it's 1000).
>  - Revert yarn-site.xml following configs to:
> {code:java}
>  
> yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users=false
> yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=nobody 
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to