Hello, Slider version 0.80 with CDH 5.5.1
Investigating a instance where Slider application errored out. slider.agent.log for many components show following trace - noticed that the "hostname" key is actually the container name e.g. "hostname": "container_e14_1513412386901_898934_03_000003___abc". The fqdn shows correct FQDN Any idea? Why would the connection be refused ? The target of the EXECUTE command had started correctly and is on same node. Thanks, WARNING 2018-02-24 02:28:53,933 Controller.py:628 - Request failed! Data: {"package": "", "nodeStatus": {"status": "HEALTHY", "cause": "NONE"}, "timestamp": 1519439333928, "hostname": "container_e14_1513412386901_898934_03_000003___abc", "responseId": 6, "fqdn": "<correct host name>", "reports": [{"status": "COMPLETED", "stderr": "None", "stdout": "2018-02-24 02:28:23,800 - Execute['XYZ'] {'pid_file': '/hadoop/disk10/yarn/logs/application_1513412386901_898934/container_e14_1513412386901_898934_03_000003/abc', 'wait_for_finish': False, 'logoutput': True, 'poll_after': 30}", "clusterName": "foo", "structuredOut": "{}", "allocatedPorts": {}, "roleCommand": "START", "serviceName": "foo", "role": "abc", "actionId": "15-1", "taskId": 15, "exitcode": 0}]} INFO 2018-02-24 02:29:16,056 security.py:89 - SSL Connect being called.. connecting to the server ERROR 2018-02-24 02:29:16,057 Controller.py:625 - Exception raised Traceback (most recent call last): File "/hadoop/disk1/yarn/local/usercache/xxx/appcache/application_1513412386901_898934/filecache/10/slider-agent.tar.gz/slider-agent/agent/Controller.py", line 619, in sendRequest self.cachedconnect = security.CachedHTTPSConnection(self.config) File "/hadoop/disk1/yarn/local/usercache/xxx/appcache/application_1513412386901_898934/filecache/10/slider-agent.tar.gz/slider-agent/agent/security.py", line 106, in __init__ self.connect() File "/hadoop/disk1/yarn/local/usercache/xxx/appcache/application_1513412386901_898934/filecache/10/slider-agent.tar.gz/slider-agent/agent/security.py", line 111, in connect self.httpsconn.connect() File "/hadoop/disk1/yarn/local/usercache/xxx/appcache/application_1513412386901_898934/filecache/10/slider-agent.tar.gz/slider-agent/agent/security.py", line 49, in connect sock=self.create_connection() File "/hadoop/disk1/yarn/local/usercache/xxx/appcache/application_1513412386901_898934/filecache/10/slider-agent.tar.gz/slider-agent/agent/security.py", line 90, in create_connection sock = socket.create_connection((self.host, self.port), 60) File "/usr/lib64/python2.6/socket.py", line 567, in create_connection raise error, msg error: [Errno 111] Connection refused