[
https://issues.apache.org/jira/browse/AMBARI-23270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Swapan Shridhar resolved AMBARI-23270.
--------------------------------------
Resolution: Invalid
> Stack Advisor and LLAP. Update Stack Advisor's capacity-scheduler walk
> through to ignore YARN Node labelling string "accessible-node-labels" for
> queues.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: AMBARI-23270
> URL: https://issues.apache.org/jira/browse/AMBARI-23270
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Reporter: Swapan Shridhar
> Assignee: Swapan Shridhar
> Priority: Major
> Fix For: trunk, 2.7.0
>
>
> *Issue:* Stack Advisor(SA) call returns 500 error (breaks) during LLAP
> calculations if *YARN Node labelling* is enabled, which makes it way to
> *capacity-scheduler*. SA does a *capacity-scheduler* walkthrough to figure
> out the capacity of the queue used by LLAP, to do the LLAP calculations.
> When YARN Node labelling enabled, the capacity-scheduler looks like this:
> (Note the presence of string *"accessible-node-labels"*)
> {code:title=capacity-scheduler with YARN Node Labelling enabled}
> yarn.scheduler.capacity.maximum-am-resource-percent=0.4
> yarn.scheduler.capacity.maximum-applications=10000
> yarn.scheduler.capacity.node-locality-delay=40
> yarn.scheduler.capacity.root.accessible-node-labels=nonllap,lowmem,llap
> yarn.scheduler.capacity.root.acl_administer_queue=*
> yarn.scheduler.capacity.root.capacity=100
> yarn.scheduler.capacity.root.default.acl_submit_applications=*
> yarn.scheduler.capacity.root.default.capacity=5
> yarn.scheduler.capacity.root.default.maximum-capacity=10
> yarn.scheduler.capacity.root.default.state=RUNNING
> yarn.scheduler.capacity.root.default.user-limit-factor=1
> yarn.scheduler.capacity.root.queues=default,llap,users
> yarn.scheduler.capacity.queue-mappings-override.enable=false
> yarn.scheduler.capacity.root.accessible-node-labels.llap.capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.llap.maximum-capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.lowmem.capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.lowmem.maximum-capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.nonllap.capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.nonllap.maximum-capacity=100
> yarn.scheduler.capacity.root.default.accessible-node-labels=nonllap,lowmem
> yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.capacity=20
> yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.maximum-capacity=20
> yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.capacity=20
> yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.maximum-capacity=20
> yarn.scheduler.capacity.root.default.default-node-label-expression=nonllap
> yarn.scheduler.capacity.root.default.priority=0
> yarn.scheduler.capacity.root.llap.accessible-node-labels=llap
> yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
> yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.maximum-capacity=100
> yarn.scheduler.capacity.root.llap.acl_administer_queue=*
> yarn.scheduler.capacity.root.llap.acl_submit_applications=*
> yarn.scheduler.capacity.root.llap.capacity=90
> yarn.scheduler.capacity.root.llap.default-node-label-expression=llap
> yarn.scheduler.capacity.root.llap.maximum-capacity=90
> yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.llap.ordering-policy=fifo
> yarn.scheduler.capacity.root.llap.priority=0
> yarn.scheduler.capacity.root.llap.state=RUNNING
> yarn.scheduler.capacity.root.llap.user-limit-factor=1
> yarn.scheduler.capacity.root.maximum-capacity=100
> yarn.scheduler.capacity.root.priority=0
> yarn.scheduler.capacity.root.users.accessible-node-labels=nonllap,lowmem
> yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.capacity=80
> yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.maximum-capacity=80
> yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.capacity=80
> yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.maximum-capacity=80
> yarn.scheduler.capacity.root.users.acl_administer_queue=*
> yarn.scheduler.capacity.root.users.acl_submit_applications=*
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels=nonllap,lowmem
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.capacity=50
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.maximum-capacity=50
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.capacity=50
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.maximum-capacity=50
> yarn.scheduler.capacity.root.users.analyst.acl_administer_queue=*
> yarn.scheduler.capacity.root.users.analyst.acl_submit_applications=*
> yarn.scheduler.capacity.root.users.analyst.capacity=50
> yarn.scheduler.capacity.root.users.analyst.maximum-capacity=80
> yarn.scheduler.capacity.root.users.analyst.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.users.analyst.ordering-policy=fifo
> yarn.scheduler.capacity.root.users.analyst.priority=0
> yarn.scheduler.capacity.root.users.analyst.state=RUNNING
> yarn.scheduler.capacity.root.users.analyst.user-limit-factor=1
> yarn.scheduler.capacity.root.users.capacity=5
> yarn.scheduler.capacity.root.users.default-node-label-expression=nonllap
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels=nonllap,lowmem
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.capacity=50
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.maximum-capacity=50
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.capacity=50
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.maximum-capacity=50
> yarn.scheduler.capacity.root.users.engineering.acl_administer_queue=*
> yarn.scheduler.capacity.root.users.engineering.acl_submit_applications=*
> yarn.scheduler.capacity.root.users.engineering.capacity=50
> yarn.scheduler.capacity.root.users.engineering.maximum-capacity=80
> yarn.scheduler.capacity.root.users.engineering.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.users.engineering.ordering-policy=fifo
> yarn.scheduler.capacity.root.users.engineering.priority=0
> yarn.scheduler.capacity.root.users.engineering.state=RUNNING
> yarn.scheduler.capacity.root.users.engineering.user-limit-factor=1
> yarn.scheduler.capacity.root.users.maximum-capacity=80
> yarn.scheduler.capacity.root.users.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.users.priority=0
> yarn.scheduler.capacity.root.users.queues=analyst,engineering
> yarn.scheduler.capacity.root.users.state=RUNNING
> yarn.scheduler.capacity.root.users.user-limit-factor=1
> {code}
> *Reason on why it breaks:* SA code is not aware of Node labelling in general.
> Thus, when it tries to calculate the capacity of the LLAP selected queue (for
> example: : *'llap'* queue), it does the following:
> Looks for the line showing capacity for *'llap'* queue and fetches the line :
> {code}
> yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
> {code}
> - It then looks for memory percentage for queues : [root,
> accessible-node-labels, llap]
> - But there is no .capacity associated with *accessible-node-labels*.
> - Thus walkthrough fails.
> *Fix:*
> Added a skip code when we detect *accessible-node-labels* / YARN Node
> Labelling enabled.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)