[
https://issues.apache.org/jira/browse/AMBARI-16913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Hurley updated AMBARI-16913:
-------------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Web Client Requests Handled By Jetty Should Not Be Blocked By JMX Property
> Providers
> ------------------------------------------------------------------------------------
>
> Key: AMBARI-16913
> URL: https://issues.apache.org/jira/browse/AMBARI-16913
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.0.0
> Reporter: Jonathan Hurley
> Assignee: Jonathan Hurley
> Priority: Blocker
> Fix For: 2.4.0
>
> Attachments: AMBARI-16913.patch
>
>
> Incoming requests from the web client (or from any REST API) will eventually
> be routed to the property provider / subresource framework. It is here were
> any JMX data is queried for within the context of the REST request. In large
> clusters, these requests can backup quite easily (even with a massive
> threadpool), causing UX degradations in the web client:
> {code}
> Thread [qtp-ambari-client-38]
>
> JMXPropertyProvider(ThreadPoolEnabledPropertyProvider).populateResources(Set<Resource>,
> Request, Predicate) line: 168
> JMXPropertyProvider.populateResources(Set<Resource>, Request,
> Predicate) line: 156
> StackDefinedPropertyProvider.populateResources(Set<Resource>, Request,
> Predicate) line: 200
> ClusterControllerImpl.populateResources(Type, Set<Resource>, Request,
> Predicate) line: 155
> QueryImpl.queryForResources() line: 407
> QueryImpl.execute() line: 217
> ReadHandler.handleRequest(Request) line: 69
> GetRequest(BaseRequest).process() line: 145
> {code}
> Consider one of the calls made by the web client:
> {code}
> GET api/v1/clusters/c1/components/?
> ServiceComponentInfo/category=MASTER&
> fields=
> ServiceComponentInfo/service_name,
> host_components/HostRoles/display_name,
> host_components/HostRoles/host_name,
> host_components/HostRoles/state,
> host_components/HostRoles/maintenance_state,
> host_components/HostRoles/stale_configs,
> host_components/HostRoles/ha_state,
> host_components/HostRoles/desired_admin_state,
> host_components/metrics/jvm/memHeapUsedM,
> host_components/metrics/jvm/HeapMemoryMax,
> host_components/metrics/jvm/HeapMemoryUsed,
> host_components/metrics/jvm/memHeapCommittedM,
> host_components/metrics/mapred/jobtracker/trackers_decommissioned,
> host_components/metrics/cpu/cpu_wio,
> host_components/metrics/rpc/client/RpcQueueTime_avg_time,
> host_components/metrics/dfs/FSNamesystem/*,
> host_components/metrics/dfs/namenode/Version,
> host_components/metrics/dfs/namenode/LiveNodes,
> host_components/metrics/dfs/namenode/DeadNodes,
> host_components/metrics/dfs/namenode/DecomNodes,
> host_components/metrics/dfs/namenode/TotalFiles,
> host_components/metrics/dfs/namenode/UpgradeFinalized,
> host_components/metrics/dfs/namenode/Safemode,
> host_components/metrics/runtime/StartTime
> {code}
> This query is essentially saying that for every {{MASTER}}, get metrics from
> them. The problem is that in a large cluster, there could be 100 masters, yet
> the metrics being asked for are only for NameNode. As a result, the JMX
> endpoints for all 100 masters are queried - *live* - as part of the request.
> There are two inherent flaws with this approach:
> - Even with millisecond JMX response times, multiplying this by 100's and
> then adding parsing overhead causes a noticeable delay in the web client as
> the federated requests are blocking the main UX request
> - Although there is a threadpool which scales up to service these requests -
> that only really works for 1 user. With multiple users logged in, you'd need
> 100's upon 100's of threads pulling in the same JMX data
> This data should never be queried for directly as part of the incoming REST
> requests. Instead, an autonomous pool of threads should be constantly
> retrieving these point-in-time metrics and updating a cache. The cache is
> then used to service all live REST requests.
> - On the first request to a resource, a cache miss occurs and no data is
> returned. I think this is acceptable since metrics take a few moments to
> populate anyway right now. As the web client polls, the next request should
> pickup the newly cached metrics.
> - Only URLs which are being asked for by incoming REST requests should be
> considered for retrieval. After sometime, if they haven't been requested,
> then the headless threadpool can stop trying to update their data
> - All JMX data will be parsed and stored in-memory, in an expiring cache
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)