[ 
https://issues.apache.org/jira/browse/AMBARI-17589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated AMBARI-17589:
---------------------------------------
    Description: 
Ambari's architectural design is based on having a single master server with 
multiple agents.  Each agent sends a heartbeat every X seconds to the server to 
report its status; the server may reply with a list of commands to be run by 
each agent.

An operational cluster may have up to 2000-4000 agents and Ambari needs to be 
robust and performant at such scale.  Often times, Ambari's overall performance 
is subject to the cluster’s environment like network latency and stability, 
Ambari database call latency, etc. In such environments, detecting the cause of 
the Ambari’s sluggish performance and/or instability have proven to be 
difficult in practice.

Ambari should intercept and store the time and resources taken for serving 
requests.  This information can be then presented to the end user on Ambari Web 
and/or Grafana. 

Optionally, this work can be extended to have Ambari Web persist time taken to 
process the response of each API call and other performance characteristics.  
Such performance data on Ambari Web can be again presented to the end user via 
Ambari Web and/or Grafana. 

> Capture & visaulize metrics for Ambari Server
> ---------------------------------------------
>
>                 Key: AMBARI-17589
>                 URL: https://issues.apache.org/jira/browse/AMBARI-17589
>             Project: Ambari
>          Issue Type: Epic
>    Affects Versions: 3.0.0
>            Reporter: Aravindan Vijayan
>            Assignee: Yusaku Sako
>            Priority: Critical
>             Fix For: 3.0.0
>
>
> Ambari's architectural design is based on having a single master server with 
> multiple agents.  Each agent sends a heartbeat every X seconds to the server 
> to report its status; the server may reply with a list of commands to be run 
> by each agent.
> An operational cluster may have up to 2000-4000 agents and Ambari needs to be 
> robust and performant at such scale.  Often times, Ambari's overall 
> performance is subject to the cluster’s environment like network latency and 
> stability, Ambari database call latency, etc. In such environments, detecting 
> the cause of the Ambari’s sluggish performance and/or instability have proven 
> to be difficult in practice.
> Ambari should intercept and store the time and resources taken for serving 
> requests.  This information can be then presented to the end user on Ambari 
> Web and/or Grafana. 
> Optionally, this work can be extended to have Ambari Web persist time taken 
> to process the response of each API call and other performance 
> characteristics.  Such performance data on Ambari Web can be again presented 
> to the end user via Ambari Web and/or Grafana. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to