[ 
https://issues.apache.org/jira/browse/AMBARI-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001976#comment-15001976
 ] 

Massimiliano Nigrelli commented on AMBARI-12485:
------------------------------------------------

It worked for me!!!

Scenario: A cluster (HDP 2.3) of 5 CentOS7 VMs with Ambari agent 2.1.2 
installed.

I had the same issue and this rose after the first node reboot. This node was 
marked with the red icon although all the services were up-and-running.

I have removed 
/var/lib/ambari-agent/data/structured-out-status.json
and suddenly the node was marked with the green icon.

Thank you Alex!!!

> Ambari agent stopped reporting status until some file was deleted
> -----------------------------------------------------------------
>
>                 Key: AMBARI-12485
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12485
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-agent
>    Affects Versions: 2.0.0
>         Environment: Centos6
>            Reporter: Alex Piggott
>
> 1) I restarted YARN after making a config change, and observed that on one of 
> the 4 nodes of a cluster (call it db001) was not restarting any of them.
> 2) I restarted ambari-agent on db001 from the command line, at which point 
> all services remained shown as down (red)
> 3) Note that I _was_ then able to restart the YARN components on db001
> 4) I found the following error message being generated every minute:
> {code}
> [root@db001 ~]# more /var/lib/ambari-agent/data/status_command_stderr.txt
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_client.py",
>  line 67, in <module>
>     ZookeeperClient().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 181, in execute
>     self.load_structured_out()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 109, in load_structured_out
>     Script.structuredOut = json.load(fp)
>   File "/usr/lib64/python2.6/json/__init__.py", line 267, in load
>     parse_constant=parse_constant, **kw)
>   File "/usr/l
> ib64/python2.6/json/__init__.py", line 307, in loads
>     return _default_decoder.decode(s)
>   File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>   File "/usr/lib64/python2.6/json/decoder.py", line 338, in raw_decode
>     raise ValueError("No JSON object could be decoded")
> ValueError: No JSON object could be decoded
> {code}
> Files:
> {code}
> -rw-r--r-- 1 root root     0 Jul 21 16:50 status_command_stdout.txt
> -rw------- 1 root root 18310 Jul 21 16:50 status_command.json
> -rw-r--r-- 1 root root  1008 Jul 21 16:50 status_command_stderr.txt
> {code}
> I stuck some print statements in the python (!!) and found out that the 
> failing file was an empty file not modified since Jul 19 (today==Jul 21):
> {code}
> [root@db001 data]# ls -l /var/lib/ambari-agent/data/structured-out-status.json
> -rw-rw-rw- 1 root root 0 Jul 19 01:22 
> /var/lib/ambari-agent/data/structured-out-status.json
> {code}
> Upon deleting that, the error messages went away, and Ambari showed all 
> components as green again.
> Note that nobody had touched the cluster since July 14
> Hope this report is of some use!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to