[
https://issues.apache.org/jira/browse/AMBARI-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001976#comment-15001976
]
Massimiliano Nigrelli commented on AMBARI-12485:
------------------------------------------------
It worked for me!!!
Scenario: A cluster (HDP 2.3) of 5 CentOS7 VMs with Ambari agent 2.1.2
installed.
I had the same issue and this rose after the first node reboot. This node was
marked with the red icon although all the services were up-and-running.
I have removed
/var/lib/ambari-agent/data/structured-out-status.json
and suddenly the node was marked with the green icon.
Thank you Alex!!!
> Ambari agent stopped reporting status until some file was deleted
> -----------------------------------------------------------------
>
> Key: AMBARI-12485
> URL: https://issues.apache.org/jira/browse/AMBARI-12485
> Project: Ambari
> Issue Type: Bug
> Components: ambari-agent
> Affects Versions: 2.0.0
> Environment: Centos6
> Reporter: Alex Piggott
>
> 1) I restarted YARN after making a config change, and observed that on one of
> the 4 nodes of a cluster (call it db001) was not restarting any of them.
> 2) I restarted ambari-agent on db001 from the command line, at which point
> all services remained shown as down (red)
> 3) Note that I _was_ then able to restart the YARN components on db001
> 4) I found the following error message being generated every minute:
> {code}
> [root@db001 ~]# more /var/lib/ambari-agent/data/status_command_stderr.txt
> Traceback (most recent call last):
> File
> "/var/lib/ambari-agent/cache/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_client.py",
> line 67, in <module>
> ZookeeperClient().execute()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 181, in execute
> self.load_structured_out()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 109, in load_structured_out
> Script.structuredOut = json.load(fp)
> File "/usr/lib64/python2.6/json/__init__.py", line 267, in load
> parse_constant=parse_constant, **kw)
> File "/usr/l
> ib64/python2.6/json/__init__.py", line 307, in loads
> return _default_decoder.decode(s)
> File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
> obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> File "/usr/lib64/python2.6/json/decoder.py", line 338, in raw_decode
> raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> {code}
> Files:
> {code}
> -rw-r--r-- 1 root root 0 Jul 21 16:50 status_command_stdout.txt
> -rw------- 1 root root 18310 Jul 21 16:50 status_command.json
> -rw-r--r-- 1 root root 1008 Jul 21 16:50 status_command_stderr.txt
> {code}
> I stuck some print statements in the python (!!) and found out that the
> failing file was an empty file not modified since Jul 19 (today==Jul 21):
> {code}
> [root@db001 data]# ls -l /var/lib/ambari-agent/data/structured-out-status.json
> -rw-rw-rw- 1 root root 0 Jul 19 01:22
> /var/lib/ambari-agent/data/structured-out-status.json
> {code}
> Upon deleting that, the error messages went away, and Ambari showed all
> components as green again.
> Note that nobody had touched the cluster since July 14
> Hope this report is of some use!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)