[jira] [Commented] (AMBARI-20392) Get aggregate metric records from HBase encounters performance issues

Chuan Jin (JIRA) Fri, 10 Mar 2017 02:36:30 -0800

    [ 
https://issues.apache.org/jira/browse/AMBARI-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904867#comment-15904867
 ]


Chuan Jin commented on AMBARI-20392:
------------------------------------

Below is my queries:

{code:sql}
0: jdbc:phoenix:my-zk >  select count(1)
. . . . . . . . . . . >  FROM METRIC_AGGREGATE 
. . . . . . . . . . . >  WHERE METRIC_NAME IN ('pkts_out','pkts_in','cpu_wio', 
'cpu_idle', 'cpu_nice','cpu_user', 'cpu_system','mem_total','mem_free', 
'yarn.NodeManagerMetrics.ContainersCompleted', 
'yarn.NodeManagerMetrics.ContainersRunning', 
'yarn.NodeManagerMetrics.ContainersFailed', 
'yarn.NodeManagerMetrics.ContainersLaunched', 
'yarn.NodeManagerMetrics.ContainersKilled', 
'yarn.NodeManagerMetrics.ContainersIniting')
. . . . . . . . . . . >  AND APP_ID = 'nodemanager'
. . . . . . . . . . . >  AND SERVER_TIME >= 1489121698000
. . . . . . . . . . . >  AND SERVER_TIME < 1489125298000;
+-----------+
| COUNT(1)  |
+-----------+
| 1800      |
+-----------+
1 row selected (37.821 seconds)
{code}

i split them into four queries:

{code:sql}
0: jdbc:phoenix:my-zk >  SELECT count(1)
. . . . . . . . . . . >  FROM METRIC_AGGREGATE 
. . . . . . . . . . . >  WHERE METRIC_NAME IN ('pkts_out','pkts_in')
. . . . . . . . . . . >  AND APP_ID = 'nodemanager'
. . . . . . . . . . . >  AND SERVER_TIME >= 1489121698000
. . . . . . . . . . . >  AND SERVER_TIME < 1489125298000;
+-----------+
| COUNT(1)  |
+-----------+
| 240       |
+-----------+
1 row selected (0.142 seconds)


0: jdbc:phoenix:my-zk >  SELECT count(1)
. . . . . . . . . . . >  FROM METRIC_AGGREGATE 
. . . . . . . . . . . >  WHERE METRIC_NAME IN ('cpu_wio', 'cpu_idle', 
'cpu_nice','cpu_user', 'cpu_system')
. . . . . . . . . . . >  AND APP_ID = 'nodemanager'
. . . . . . . . . . . >  AND SERVER_TIME >= 1489121698000
. . . . . . . . . . . >  AND SERVER_TIME < 1489125298000;
+-----------+
| COUNT(1)  |
+-----------+
| 600       |
+-----------+
1 row selected (0.266 seconds)


0: jdbc:phoenix:my-zk >  SELECT count(1)
. . . . . . . . . . . >  FROM METRIC_AGGREGATE 
. . . . . . . . . . . >  WHERE METRIC_NAME IN ('mem_total','mem_free')
. . . . . . . . . . . >  AND APP_ID = 'nodemanager'
. . . . . . . . . . . >  AND SERVER_TIME >= 1489121698000
. . . . . . . . . . . >  AND SERVER_TIME < 1489125298000;
+-----------+
| COUNT(1)  |
+-----------+
| 240       |
+-----------+
1 row selected (0.12 seconds)

0: jdbc:phoenix:my-zk >  SELECT count(1)
. . . . . . . . . . . >  FROM METRIC_AGGREGATE 
. . . . . . . . . . . >  WHERE METRIC_NAME IN 
('yarn.NodeManagerMetrics.ContainersCompleted', 
'yarn.NodeManagerMetrics.ContainersRunning', 
'yarn.NodeManagerMetrics.ContainersFailed', 
'yarn.NodeManagerMetrics.ContainersLaunched', 
'yarn.NodeManagerMetrics.ContainersKilled', 
'yarn.NodeManagerMetrics.ContainersIniting')
. . . . . . . . . . . >  AND APP_ID = 'nodemanager'
. . . . . . . . . . . >  AND SERVER_TIME >= 1489121698000
. . . . . . . . . . . >  AND SERVER_TIME < 1489125298000;
+-----------+
| COUNT(1)  |
+-----------+
| 720       |
+-----------+
1 row selected (0.154 seconds)
{code}

> Get aggregate metric records from HBase encounters performance issues
> ---------------------------------------------------------------------
>
>                 Key: AMBARI-20392
>                 URL: https://issues.apache.org/jira/browse/AMBARI-20392
>             Project: Ambari
>          Issue Type: Improvement
>          Components: ambari-metrics
>    Affects Versions: 2.4.2
>            Reporter: Chuan Jin
>
> I have a mini cluster ( ~6 nodes)  managed by Ambari, and use a distributed 
> HBase (~3 nodes) to hold  metrics collected from these nodes.  After I deploy 
> YARN serivce, then I notice that  some widgets (Cluster Memory,Cluster 
> Disk,...)  cannot  display properly in the YARN service dashboard page.  And 
> Ambari Server has continuous timeout exceptions, which complains that it 
> doesn't get timeline metrics for connection refused.
> The request timeout parameter is 5s, which means the query of getting metrics 
> from HBase takes more time than that. Then I use Phoenix shell to login and 
> perform the same query in the HBase , and it takes nearly 30s to finish.  But 
> If I split the big query into small pieces , i mean, use less values in the 
> "metric_name" field in the where ... in clause , then the result return in 1s 
> after several small queries.  
> The query performance in HBase is highly based on the design of rowkey and 
> the proper usage for it.  In the method of getting aggregate metrics,  AMS 
> collector query the METRIC_AGGREGATE  table in a way that may cause the 
> co-processor to scan several regions across different RS. If we add more 
> metrics in the service dashboard, this situation will be worse.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (AMBARI-20392) Get aggregate metric records from HBase encounters performance issues

Reply via email to