[ 
https://issues.apache.org/jira/browse/AMBARI-16913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305131#comment-15305131
 ] 

Hadoop QA commented on AMBARI-16913:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12806720/AMBARI-16913.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The patch failed these unit tests in 
ambari-server:

                  org.apache.ambari.server.state.stack.ConfigUpgradeValidityTest
                  
org.apache.ambari.server.controller.metrics.RestMetricsPropertyProviderTest

Test results: 
https://builds.apache.org/job/Ambari-trunk-test-patch/7041//testReport/
Console output: 
https://builds.apache.org/job/Ambari-trunk-test-patch/7041//console

This message is automatically generated.

> Web Client Requests Handled By Jetty Should Not Be Blocked By JMX Property 
> Providers
> ------------------------------------------------------------------------------------
>
>                 Key: AMBARI-16913
>                 URL: https://issues.apache.org/jira/browse/AMBARI-16913
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Blocker
>             Fix For: 2.4.0
>
>         Attachments: AMBARI-16913.patch
>
>
> Incoming requests from the web client (or from any REST API) will eventually 
> be routed to the property provider / subresource framework. It is here were 
> any JMX data is queried for within the context of the REST request. In large 
> clusters, these requests can backup quite easily (even with a massive 
> threadpool), causing UX degradations in the web client:
> {code}
> Thread [qtp-ambari-client-38]
>       
> JMXPropertyProvider(ThreadPoolEnabledPropertyProvider).populateResources(Set<Resource>,
>  Request, Predicate) line: 168   
>       JMXPropertyProvider.populateResources(Set<Resource>, Request, 
> Predicate) line: 156      
>       StackDefinedPropertyProvider.populateResources(Set<Resource>, Request, 
> Predicate) line: 200     
>       ClusterControllerImpl.populateResources(Type, Set<Resource>, Request, 
> Predicate) line: 155      
>       QueryImpl.queryForResources() line: 407 
>       QueryImpl.execute() line: 217   
>       ReadHandler.handleRequest(Request) line: 69     
>       GetRequest(BaseRequest).process() line: 145     
> {code}
> Consider one of the calls made by the web client:
> {code}
> GET api/v1/clusters/c1/components/?
> ServiceComponentInfo/category=MASTER&
> fields=
> ServiceComponentInfo/service_name,
> host_components/HostRoles/display_name,
> host_components/HostRoles/host_name,
> host_components/HostRoles/state,
> host_components/HostRoles/maintenance_state,
> host_components/HostRoles/stale_configs,
> host_components/HostRoles/ha_state,
> host_components/HostRoles/desired_admin_state,
> host_components/metrics/jvm/memHeapUsedM,
> host_components/metrics/jvm/HeapMemoryMax,
> host_components/metrics/jvm/HeapMemoryUsed,
> host_components/metrics/jvm/memHeapCommittedM,
> host_components/metrics/mapred/jobtracker/trackers_decommissioned,
> host_components/metrics/cpu/cpu_wio,
> host_components/metrics/rpc/client/RpcQueueTime_avg_time,
> host_components/metrics/dfs/FSNamesystem/*,
> host_components/metrics/dfs/namenode/Version,
> host_components/metrics/dfs/namenode/LiveNodes,
> host_components/metrics/dfs/namenode/DeadNodes,
> host_components/metrics/dfs/namenode/DecomNodes,
> host_components/metrics/dfs/namenode/TotalFiles,
> host_components/metrics/dfs/namenode/UpgradeFinalized,
> host_components/metrics/dfs/namenode/Safemode,
> host_components/metrics/runtime/StartTime
> {code}
> This query is essentially saying that for every {{MASTER}}, get metrics from 
> them. The problem is that in a large cluster, there could be 100 masters, yet 
> the metrics being asked for are only for NameNode. As a result, the JMX 
> endpoints for all 100 masters are queried - *live* - as part of the request.
> There are two inherent flaws with this approach:
> - Even with millisecond JMX response times, multiplying this by 100's and 
> then adding parsing overhead causes a noticeable delay in the web client as 
> the federated requests are blocking the main UX request
> - Although there is a threadpool which scales up to service these requests - 
> that only really works for 1 user. With multiple users logged in, you'd need 
> 100's upon 100's of threads pulling in the same JMX data
> This data should never be queried for directly as part of the incoming REST 
> requests. Instead, an autonomous pool of threads should be constantly 
> retrieving these point-in-time metrics and updating a cache. The cache is 
> then used to service all live REST requests. 
> - On the first request to a resource, a cache miss occurs and no data is 
> returned. I think this is acceptable since metrics take a few moments to 
> populate anyway right now. As the web client polls, the next request should 
> pickup the newly cached metrics.
> - Only URLs which are being asked for by incoming REST requests should be 
> considered for retrieval. After sometime, if they haven't been requested, 
> then the headless threadpool can stop trying to update their data
> - All JMX data will be parsed and stored in-memory, in an expiring cache



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to