[ 
https://issues.apache.org/jira/browse/AMBARI-23270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swapan Shridhar updated AMBARI-23270:
-------------------------------------
    Description: 
*Issue:* Stack Advisor(SA) call returns 500 error (breaks) during LLAP 
calculations if *YARN Node labelling* is enabled, which makes it way to 
*capacity-scheduler*. SA does a *capacity-scheduler* walkthrough to figure out 
the capacity of the queue used by LLAP, to do the LLAP calculations.

When YARN Node labelling enabled, the capacity-scheduler looks like this: (Note 
the presence of string *"accessible-node-labels"*)

{code:title=capacity-scheduler with YARN Node Labelling enabled}
yarn.scheduler.capacity.maximum-am-resource-percent=0.4
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.accessible-node-labels=nonllap,lowmem,llap
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=5
yarn.scheduler.capacity.root.default.maximum-capacity=10
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.queues=default,llap,users
yarn.scheduler.capacity.queue-mappings-override.enable=false
yarn.scheduler.capacity.root.accessible-node-labels.llap.capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.llap.maximum-capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.lowmem.capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.lowmem.maximum-capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.nonllap.capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.nonllap.maximum-capacity=100
yarn.scheduler.capacity.root.default.accessible-node-labels=nonllap,lowmem
yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.capacity=20
yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.maximum-capacity=20
yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.capacity=20
yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.maximum-capacity=20
yarn.scheduler.capacity.root.default.default-node-label-expression=nonllap
yarn.scheduler.capacity.root.default.priority=0
yarn.scheduler.capacity.root.llap.accessible-node-labels=llap
yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.maximum-capacity=100
yarn.scheduler.capacity.root.llap.acl_administer_queue=*
yarn.scheduler.capacity.root.llap.acl_submit_applications=*
yarn.scheduler.capacity.root.llap.capacity=90
yarn.scheduler.capacity.root.llap.default-node-label-expression=llap
yarn.scheduler.capacity.root.llap.maximum-capacity=90
yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.llap.ordering-policy=fifo
yarn.scheduler.capacity.root.llap.priority=0
yarn.scheduler.capacity.root.llap.state=RUNNING
yarn.scheduler.capacity.root.llap.user-limit-factor=1
yarn.scheduler.capacity.root.maximum-capacity=100
yarn.scheduler.capacity.root.priority=0
yarn.scheduler.capacity.root.users.accessible-node-labels=nonllap,lowmem
yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.capacity=80
yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.maximum-capacity=80
yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.capacity=80
yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.maximum-capacity=80
yarn.scheduler.capacity.root.users.acl_administer_queue=*
yarn.scheduler.capacity.root.users.acl_submit_applications=*
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels=nonllap,lowmem
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.capacity=50
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.maximum-capacity=50
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.capacity=50
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.maximum-capacity=50
yarn.scheduler.capacity.root.users.analyst.acl_administer_queue=*
yarn.scheduler.capacity.root.users.analyst.acl_submit_applications=*
yarn.scheduler.capacity.root.users.analyst.capacity=50
yarn.scheduler.capacity.root.users.analyst.maximum-capacity=80
yarn.scheduler.capacity.root.users.analyst.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.users.analyst.ordering-policy=fifo
yarn.scheduler.capacity.root.users.analyst.priority=0
yarn.scheduler.capacity.root.users.analyst.state=RUNNING
yarn.scheduler.capacity.root.users.analyst.user-limit-factor=1
yarn.scheduler.capacity.root.users.capacity=5
yarn.scheduler.capacity.root.users.default-node-label-expression=nonllap
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels=nonllap,lowmem
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.capacity=50
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.maximum-capacity=50
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.capacity=50
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.maximum-capacity=50
yarn.scheduler.capacity.root.users.engineering.acl_administer_queue=*
yarn.scheduler.capacity.root.users.engineering.acl_submit_applications=*
yarn.scheduler.capacity.root.users.engineering.capacity=50
yarn.scheduler.capacity.root.users.engineering.maximum-capacity=80
yarn.scheduler.capacity.root.users.engineering.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.users.engineering.ordering-policy=fifo
yarn.scheduler.capacity.root.users.engineering.priority=0
yarn.scheduler.capacity.root.users.engineering.state=RUNNING
yarn.scheduler.capacity.root.users.engineering.user-limit-factor=1
yarn.scheduler.capacity.root.users.maximum-capacity=80
yarn.scheduler.capacity.root.users.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.users.priority=0
yarn.scheduler.capacity.root.users.queues=analyst,engineering
yarn.scheduler.capacity.root.users.state=RUNNING
yarn.scheduler.capacity.root.users.user-limit-factor=1
{code}


*Reason on why it breaks:* SA code is not aware of Node labelling in general. 
Thus, when it tries to calculate the capacity of the LLAP selected queue (for 
example: : *'llap'* queue), it does the following:

Looks for the line showing capacity for *'llap'* queue and fetches the line : 
{code}
yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
{code}

- It then looks for memory percentage for queues : [root, 
accessible-node-labels, llap]
- But there is no .capacity associated with *accessible-node-labels*.
- Thus walkthrough fails.

Looks for the line showing capacity for *'llap'* queue:


*Fix:*

Added a skip code when we detect *accessible-node-labels* / YARN Node Labelling 
enabled



  was:
*Issue:* Stack Advisor(SA) call returns 500 error (breaks) during LLAP 
calculations if *YARN Node labelling* is enabled, which makes it way to 
*capacity-scheduler*. SA does a *capacity-scheduler* walkthrough to figure out 
the capacity of the queue used by LLAP, to do the LLAP calculations.

When YARN Node labelling enabled, the capacity-scheduler looks like this: (Note 
the presence of string *"accessible-node-labels"*)

{code:title=capacity-scheduler}
yarn.scheduler.capacity.maximum-am-resource-percent=0.4
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.accessible-node-labels=nonllap,lowmem,llap
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.capacity=5
yarn.scheduler.capacity.root.default.maximum-capacity=10
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.queues=default,llap,users
yarn.scheduler.capacity.queue-mappings-override.enable=false
yarn.scheduler.capacity.root.accessible-node-labels.llap.capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.llap.maximum-capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.lowmem.capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.lowmem.maximum-capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.nonllap.capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.nonllap.maximum-capacity=100
yarn.scheduler.capacity.root.default.accessible-node-labels=nonllap,lowmem
yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.capacity=20
yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.maximum-capacity=20
yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.capacity=20
yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.maximum-capacity=20
yarn.scheduler.capacity.root.default.default-node-label-expression=nonllap
yarn.scheduler.capacity.root.default.priority=0
yarn.scheduler.capacity.root.llap.accessible-node-labels=llap
yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.maximum-capacity=100
yarn.scheduler.capacity.root.llap.acl_administer_queue=*
yarn.scheduler.capacity.root.llap.acl_submit_applications=*
yarn.scheduler.capacity.root.llap.capacity=90
yarn.scheduler.capacity.root.llap.default-node-label-expression=llap
yarn.scheduler.capacity.root.llap.maximum-capacity=90
yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.llap.ordering-policy=fifo
yarn.scheduler.capacity.root.llap.priority=0
yarn.scheduler.capacity.root.llap.state=RUNNING
yarn.scheduler.capacity.root.llap.user-limit-factor=1
yarn.scheduler.capacity.root.maximum-capacity=100
yarn.scheduler.capacity.root.priority=0
yarn.scheduler.capacity.root.users.accessible-node-labels=nonllap,lowmem
yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.capacity=80
yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.maximum-capacity=80
yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.capacity=80
yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.maximum-capacity=80
yarn.scheduler.capacity.root.users.acl_administer_queue=*
yarn.scheduler.capacity.root.users.acl_submit_applications=*
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels=nonllap,lowmem
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.capacity=50
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.maximum-capacity=50
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.capacity=50
yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.maximum-capacity=50
yarn.scheduler.capacity.root.users.analyst.acl_administer_queue=*
yarn.scheduler.capacity.root.users.analyst.acl_submit_applications=*
yarn.scheduler.capacity.root.users.analyst.capacity=50
yarn.scheduler.capacity.root.users.analyst.maximum-capacity=80
yarn.scheduler.capacity.root.users.analyst.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.users.analyst.ordering-policy=fifo
yarn.scheduler.capacity.root.users.analyst.priority=0
yarn.scheduler.capacity.root.users.analyst.state=RUNNING
yarn.scheduler.capacity.root.users.analyst.user-limit-factor=1
yarn.scheduler.capacity.root.users.capacity=5
yarn.scheduler.capacity.root.users.default-node-label-expression=nonllap
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels=nonllap,lowmem
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.capacity=50
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.maximum-capacity=50
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.capacity=50
yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.maximum-capacity=50
yarn.scheduler.capacity.root.users.engineering.acl_administer_queue=*
yarn.scheduler.capacity.root.users.engineering.acl_submit_applications=*
yarn.scheduler.capacity.root.users.engineering.capacity=50
yarn.scheduler.capacity.root.users.engineering.maximum-capacity=80
yarn.scheduler.capacity.root.users.engineering.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.users.engineering.ordering-policy=fifo
yarn.scheduler.capacity.root.users.engineering.priority=0
yarn.scheduler.capacity.root.users.engineering.state=RUNNING
yarn.scheduler.capacity.root.users.engineering.user-limit-factor=1
yarn.scheduler.capacity.root.users.maximum-capacity=80
yarn.scheduler.capacity.root.users.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.users.priority=0
yarn.scheduler.capacity.root.users.queues=analyst,engineering
yarn.scheduler.capacity.root.users.state=RUNNING
yarn.scheduler.capacity.root.users.user-limit-factor=1
{code}


*Reason on why it breaks:* SA code is not aware of Node labelling in general. 
Thus, when it tries to calculate the capacity of the LLAP selected queue (for 
example: : *'llap'* queue), it does the following:

Looks for the line showing capacity for *'llap'* queue and fetches the line : 
{code}
yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
{code}

- It then looks for memory percentage for queues : [root, 
accessible-node-labels, llap]
- But there is no .capacity associated with *accessible-node-labels*.
- Thus walkthrough fails.

Looks for the line showing capacity for *'llap'* queue:


*Fix:*

Added a skip code when we detect *accessible-node-labels* / YARN Node Labelling 
enabled




> Stack Advisor and LLAP. Update Stack Advisor's capacity-scheduler walk 
> through to ignore YARN Node labelling string "accessible-node-labels" for 
> queues.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-23270
>                 URL: https://issues.apache.org/jira/browse/AMBARI-23270
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>            Reporter: Swapan Shridhar
>            Assignee: Swapan Shridhar
>            Priority: Major
>             Fix For: trunk, 2.7.0
>
>
> *Issue:* Stack Advisor(SA) call returns 500 error (breaks) during LLAP 
> calculations if *YARN Node labelling* is enabled, which makes it way to 
> *capacity-scheduler*. SA does a *capacity-scheduler* walkthrough to figure 
> out the capacity of the queue used by LLAP, to do the LLAP calculations.
> When YARN Node labelling enabled, the capacity-scheduler looks like this: 
> (Note the presence of string *"accessible-node-labels"*)
> {code:title=capacity-scheduler with YARN Node Labelling enabled}
> yarn.scheduler.capacity.maximum-am-resource-percent=0.4
> yarn.scheduler.capacity.maximum-applications=10000
> yarn.scheduler.capacity.node-locality-delay=40
> yarn.scheduler.capacity.root.accessible-node-labels=nonllap,lowmem,llap
> yarn.scheduler.capacity.root.acl_administer_queue=*
> yarn.scheduler.capacity.root.capacity=100
> yarn.scheduler.capacity.root.default.acl_submit_applications=*
> yarn.scheduler.capacity.root.default.capacity=5
> yarn.scheduler.capacity.root.default.maximum-capacity=10
> yarn.scheduler.capacity.root.default.state=RUNNING
> yarn.scheduler.capacity.root.default.user-limit-factor=1
> yarn.scheduler.capacity.root.queues=default,llap,users
> yarn.scheduler.capacity.queue-mappings-override.enable=false
> yarn.scheduler.capacity.root.accessible-node-labels.llap.capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.llap.maximum-capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.lowmem.capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.lowmem.maximum-capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.nonllap.capacity=100
> yarn.scheduler.capacity.root.accessible-node-labels.nonllap.maximum-capacity=100
> yarn.scheduler.capacity.root.default.accessible-node-labels=nonllap,lowmem
> yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.capacity=20
> yarn.scheduler.capacity.root.default.accessible-node-labels.lowmem.maximum-capacity=20
> yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.capacity=20
> yarn.scheduler.capacity.root.default.accessible-node-labels.nonllap.maximum-capacity=20
> yarn.scheduler.capacity.root.default.default-node-label-expression=nonllap
> yarn.scheduler.capacity.root.default.priority=0
> yarn.scheduler.capacity.root.llap.accessible-node-labels=llap
> yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
> yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.maximum-capacity=100
> yarn.scheduler.capacity.root.llap.acl_administer_queue=*
> yarn.scheduler.capacity.root.llap.acl_submit_applications=*
> yarn.scheduler.capacity.root.llap.capacity=90
> yarn.scheduler.capacity.root.llap.default-node-label-expression=llap
> yarn.scheduler.capacity.root.llap.maximum-capacity=90
> yarn.scheduler.capacity.root.llap.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.llap.ordering-policy=fifo
> yarn.scheduler.capacity.root.llap.priority=0
> yarn.scheduler.capacity.root.llap.state=RUNNING
> yarn.scheduler.capacity.root.llap.user-limit-factor=1
> yarn.scheduler.capacity.root.maximum-capacity=100
> yarn.scheduler.capacity.root.priority=0
> yarn.scheduler.capacity.root.users.accessible-node-labels=nonllap,lowmem
> yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.capacity=80
> yarn.scheduler.capacity.root.users.accessible-node-labels.lowmem.maximum-capacity=80
> yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.capacity=80
> yarn.scheduler.capacity.root.users.accessible-node-labels.nonllap.maximum-capacity=80
> yarn.scheduler.capacity.root.users.acl_administer_queue=*
> yarn.scheduler.capacity.root.users.acl_submit_applications=*
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels=nonllap,lowmem
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.capacity=50
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.lowmem.maximum-capacity=50
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.capacity=50
> yarn.scheduler.capacity.root.users.analyst.accessible-node-labels.nonllap.maximum-capacity=50
> yarn.scheduler.capacity.root.users.analyst.acl_administer_queue=*
> yarn.scheduler.capacity.root.users.analyst.acl_submit_applications=*
> yarn.scheduler.capacity.root.users.analyst.capacity=50
> yarn.scheduler.capacity.root.users.analyst.maximum-capacity=80
> yarn.scheduler.capacity.root.users.analyst.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.users.analyst.ordering-policy=fifo
> yarn.scheduler.capacity.root.users.analyst.priority=0
> yarn.scheduler.capacity.root.users.analyst.state=RUNNING
> yarn.scheduler.capacity.root.users.analyst.user-limit-factor=1
> yarn.scheduler.capacity.root.users.capacity=5
> yarn.scheduler.capacity.root.users.default-node-label-expression=nonllap
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels=nonllap,lowmem
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.capacity=50
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.lowmem.maximum-capacity=50
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.capacity=50
> yarn.scheduler.capacity.root.users.engineering.accessible-node-labels.nonllap.maximum-capacity=50
> yarn.scheduler.capacity.root.users.engineering.acl_administer_queue=*
> yarn.scheduler.capacity.root.users.engineering.acl_submit_applications=*
> yarn.scheduler.capacity.root.users.engineering.capacity=50
> yarn.scheduler.capacity.root.users.engineering.maximum-capacity=80
> yarn.scheduler.capacity.root.users.engineering.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.users.engineering.ordering-policy=fifo
> yarn.scheduler.capacity.root.users.engineering.priority=0
> yarn.scheduler.capacity.root.users.engineering.state=RUNNING
> yarn.scheduler.capacity.root.users.engineering.user-limit-factor=1
> yarn.scheduler.capacity.root.users.maximum-capacity=80
> yarn.scheduler.capacity.root.users.minimum-user-limit-percent=100
> yarn.scheduler.capacity.root.users.priority=0
> yarn.scheduler.capacity.root.users.queues=analyst,engineering
> yarn.scheduler.capacity.root.users.state=RUNNING
> yarn.scheduler.capacity.root.users.user-limit-factor=1
> {code}
> *Reason on why it breaks:* SA code is not aware of Node labelling in general. 
> Thus, when it tries to calculate the capacity of the LLAP selected queue (for 
> example: : *'llap'* queue), it does the following:
> Looks for the line showing capacity for *'llap'* queue and fetches the line : 
> {code}
> yarn.scheduler.capacity.root.llap.accessible-node-labels.llap.capacity=100
> {code}
> - It then looks for memory percentage for queues : [root, 
> accessible-node-labels, llap]
> - But there is no .capacity associated with *accessible-node-labels*.
> - Thus walkthrough fails.
> Looks for the line showing capacity for *'llap'* queue:
> *Fix:*
> Added a skip code when we detect *accessible-node-labels* / YARN Node 
> Labelling enabled



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to