I had to do some network reconfiguration on our cluster. After rebooting everything and restarting the ambari server and the ambari agents, the server reports (via the UI) that it is not receiving heartbeats. However, when I look at the server and agent logs, I see heartbeat activity:
agent: INFO 2013-07-15 11:40:12,169 Heartbeat.py:61 - Sending heartbeat with response id: 251 and timestamp: 1373902812168 INFO 2013-07-15 11:40:12,214 Controller.py:176 - No commands sent from the Server. server 11:41:44,760 INFO HeartBeatHandler:108 - Received heartbeat from host, hostname=foo.net, currentResponseId=260, receivedResponseId=260 11:41:44,761 INFO AgentResource:109 - Sending heartbeat response with response id 261 (response id's don't match because I didn't try to capture them in unison). I suspect there may be persisted state in the postgres database from the previous network configuration that is causing the problem. Any suggestions for a fix short of a complete redeploy? TIA Brian
