[
https://issues.apache.org/jira/browse/METRON-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323800#comment-16323800
]
ASF GitHub Bot commented on METRON-1326:
----------------------------------------
Github user anandsubbu commented on a diff in the pull request:
https://github.com/apache/metron/pull/894#discussion_r161184707
--- Diff:
metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/ELASTICSEARCH/5.6.2/package/scripts/elastic_master.py
---
@@ -56,8 +59,28 @@ def status(self, env):
import params
env.set_params(params)
Logger.info('Check status of Elasticsearch master node')
- status_cmd = "service elasticsearch status"
- Execute(status_cmd)
+
+ # return codes defined by LSB
+ #
http://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/iniscrptact.html
+ cmd = ('service', 'elasticsearch', 'status')
+
+ rc, out = shell.call(cmd, sudo=True, quiet=False)
+
+ if rc == 3:
--- End diff --
On my 12-node CentOS 7, when I hit 'Stop Services', Kibana service stop
still failed with the following error:
```
stderr: /var/lib/ambari-agent/data/errors-753.txt
Traceback (most recent call last):
File
"/var/lib/ambari-agent/cache/common-services/KIBANA/5.6.2/package/scripts/kibana_master.py",
line 153, in <module>
Kibana().execute()
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 332, in execute
self.execute_prefix_function(self.command_name, 'after', env)
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 350, in execute_prefix_function
method(env)
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 398, in after_stop
status_method(env)
File
"/var/lib/ambari-agent/cache/common-services/KIBANA/5.6.2/package/scripts/kibana_master.py",
line 118, in status
raise ExecutionFailed(err_msg, rc, out)
resource_management.core.exceptions.ExecutionFailed: Execution of 'service
kibana status' returned 2
stdout: /var/lib/ambari-agent/data/output-753.txt
2018-01-12 10:08:11,166 - Stop Kibana Master
2018-01-12 10:08:11,166 - Execute['service kibana stop'] {}
2018-01-12 10:08:12,251 - Waiting for actual component stop
2018-01-12 10:08:12,251 - Status of the Master
2018-01-12 10:08:12,251 - call[('service', 'kibana', 'status')] {'sudo':
True, 'quiet': False}
2018-01-12 10:08:12,285 - call returned (2, 'kibana is not running')
Command failed after 1 tries
```
This check only looks for exit status 3 alone.
I made a local chance to treat any non-zero exit status as
`ComponentIsNotRunning` and it worked fine.
> Metron deploy with Kerberos fails on Ambari 2.5 during ES service stop
> ----------------------------------------------------------------------
>
> Key: METRON-1326
> URL: https://issues.apache.org/jira/browse/METRON-1326
> Project: Metron
> Issue Type: Bug
> Environment: 12 node VM cluster running CentOS 7
> Reporter: Anand Subramanian
> Assignee: Michael Miklavcic
>
> I am noticing that Metron deploy is failing when enabling Kerberos on a
> 12-node VM cluster managed by Ambari 2.5.2.
> The error is seen during the "Stop Services" step while kerberizing for
> Elasticsearch Master and Elasticsearch Data Node services.
> I confirmed that the same deployment goes through fine for Ambari 2.4.2
> version. I am able to setup the Kerberized cluster fine.
> For Ambari 2.4, for the "Elasticsearch Data Node Stop" step, we stop the
> slave, and do not check on the status of the service after the 'service stop'
> command was issued. But with Ambari 2.5, we attempt to check the status after
> the service stop command was issued.
> *In Ambari 2.4*
> {code}
> stdout:
> Stop the Slave
> 2017-11-07 10:21:27,755 - Execute['service elasticsearch stop'] {}
> Command completed successfully!
> {code}
> *In Ambari 2.5*
> {code}
> Stop the Slave
> 2017-11-07 10:12:48,481 - Execute['service elasticsearch stop'] {}
> 2017-11-07 10:12:48,599 - Waiting for actual component stop
> Status of the Slave
> 2017-11-07 10:12:48,600 - Execute['service elasticsearch status'] {}
> Command failed after 1 tries
> {code}
> Apparently the status command is returning a result with error code 3, which
> the ambari agent is not liking and hence calling the step as a failure.
> I am not sure entirely if this is something to be handled by Metron or by
> Ambari. Please feel free to close this defect in case this is deemed out of
> scope of Metron.
> Here is the full error log from the UI
> {code}
> stderr:
> Traceback (most recent call last):
> File
> "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py",
> line 71, in <module>
> Elasticsearch().execute()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 332, in execute
> self.execute_prefix_function(self.command_name, 'after', env)
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 350, in execute_prefix_function
> method(env)
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 398, in after_stop
> status_method(env)
> File
> "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py",
> line 59, in status
> Execute(status_cmd)
> File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
> line 166, in __init__
> self.env.run()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 160, in run
> self.run_action(resource, action)
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 124, in run_action
> provider_action()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
> line 262, in action_run
> tries=self.resource.tries, try_sleep=self.resource.try_sleep)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 72, in inner
> result = function(command, **kwargs)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 102, in checked_call
> tries=tries, try_sleep=try_sleep,
> timeout_kill_strategy=timeout_kill_strategy)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 150, in _call_wrapper
> result = _call(command, **kwargs_copy)
> File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
> line 303, in _call
> raise ExecutionFailed(err_msg, code, out, err)
> resource_management.core.exceptions.ExecutionFailed: Execution of 'service
> elasticsearch status' returned 3. ● elasticsearch.service - Elasticsearch
> Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled;
> vendor preset: disabled)
> Active: inactive (dead)
> Docs: http://www.elastic.co
> Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07
> 10:12:47,340][INFO ][cluster.service ] [metron-12.openstacklocal]
> removed
> {{metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false},},
> reason:
> zen-disco-node_left({metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false})
> Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07
> 10:12:47,466][INFO ][cluster.service ] [metron-12.openstacklocal]
> removed
> {{metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false},},
> reason:
> zen-disco-node_left({metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false})
> Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07
> 10:12:47,548][INFO ][cluster.service ] [metron-12.openstacklocal]
> removed
> {{metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false},},
> reason:
> zen-disco-node_left({metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false})
> Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07
> 10:12:47,713][INFO ][cluster.service ] [metron-12.openstacklocal]
> removed
> {{metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false},},
> reason:
> zen-disco-node_left({metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false})
> Nov 07 10:12:48 metron-12 systemd[1]: Stopping Elasticsearch...
> Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07
> 10:12:48,417][INFO ][node ] [metron-12.openstacklocal]
> stopping ...
> Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07
> 10:12:48,456][INFO ][node ] [metron-12.openstacklocal]
> stopped
> Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07
> 10:12:48,456][INFO ][node ] [metron-12.openstacklocal]
> closing ...
> Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07
> 10:12:48,491][INFO ][node ] [metron-12.openstacklocal]
> closed
> Nov 07 10:12:48 metron-12 systemd[1]: Stopped Elasticsearch.
> stdout:
> Stop the Slave
> 2017-11-07 10:12:49,025 - Execute['service elasticsearch stop'] {}
> 2017-11-07 10:12:49,089 - Waiting for actual component stop
> Status of the Slave
> 2017-11-07 10:12:49,090 - Execute['service elasticsearch status'] {}
> Command failed after 1 tries
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)