Srikanth Janardhan created AMBARI-24179:
-------------------------------------------
Summary: Ambari Metrics Service check fails after deleting a host
Key: AMBARI-24179
URL: https://issues.apache.org/jira/browse/AMBARI-24179
Project: Ambari
Issue Type: Bug
Components: ambari-metrics
Affects Versions: 2.7.0
Reporter: Srikanth Janardhan
Assignee: Dmytro Sen
Fix For: 2.7.0
ambari metrics service check failed immediately after deleting a host:
{code:java}
stderr:
Traceback (most recent call last):
File
"/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py",
line 304, in
AMSServiceCheck().execute()
File
"/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py",
line 353, in execute
method(env)
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89,
in thunk
return fn(*args, **kwargs)
File
"/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py",
line 184, in service_check
raise Fail("All metrics collectors are unavailable.")
resource_management.core.exceptions.Fail: All metrics collectors are
unavailable.
stdout:
2018-06-25 04:42:25,088 - Using hadoop conf dir:
/usr/hdp/3.0.0.0-1541/hadoop/conf
2018-06-25 04:42:25,095 - checked_call['hostid'] {}
2018-06-25 04:42:25,100 - checked_call returned (0, '1bac1213')
2018-06-25 04:42:25,102 - Ambari Metrics service check was started.
2018-06-25 04:42:25,121 - Generated metrics for host
ctr-e138-1518143905142-378410-01-000009.hwx.site :
{
"metrics": [
{
"metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
"appid": "amssmoketestfake",
"hostname": "ctr-e138-1518143905142-378410-01-000012.hwx.site",
"starttime": 1529901745000,
"metrics": {
"1529901745000": 0.602995821312,
"1529901746000": 1529901745000
}
}
]
}
2018-06-25 04:42:25,122 - Connecting (POST) to
ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics/
2018-06-25 04:42:25,132 - Http response for host
ctr-e138-1518143905142-378410-01-000009.hwx.site: 200 OK
2018-06-25 04:42:25,133 - Http data:
2018-06-25 04:42:25,133 - Metrics were saved.
2018-06-25 04:42:25,133 - Connecting (GET) to
ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&hostname=ctr-e138-1518143905142-378410-01-000012.hwx.site&precision=seconds&grouped=false&startTime=1529901685000&appId=amssmoketestfake&endTime=1529901806000
2018-06-25 04:42:25,138 - Http response for host
ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:25,138 - Http data: {"metrics":[]}
2018-06-25 04:42:25,138 - Metrics were retrieved from host
ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:25,139 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:35,154 - Http response for host
ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:35,154 - Http data: {"metrics":[]}
2018-06-25 04:42:35,154 - Metrics were retrieved from host
ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:35,155 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:45,170 - Http response for host
ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:45,170 - Http data: {"metrics":[]}
2018-06-25 04:42:45,171 - Metrics were retrieved from host
ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:45,171 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:55,186 - Http response for host
ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:55,186 - Http data: {"metrics":[]}
2018-06-25 04:42:55,187 - Metrics were retrieved from host
ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:55,187 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:43:05,204 - Http response for host
ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:43:05,204 - Http data: {"metrics":[]}
2018-06-25 04:43:05,205 - Metrics were retrieved from host
ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:43:05,205 - Ambari Metrics service check failed on collector host
ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values
0.602995821312 and 1529901745000 were not found in the response.
2018-06-25 04:43:05,207 - Exception while running function '>' for
'ctr-e138-1518143905142-378410-01-000009.hwx.site'. Reason : Ambari Metrics
service check failed on collector host
ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values
0.602995821312 and 1529901745000 were not found in the response.
Command failed after 1 tries
{code}
*Subsequent service check passed, though.*
The issue looks not related to the host deletion, seems like sometimes the data
posted during service check is not saved by collector.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)