[ 
https://issues.apache.org/jira/browse/METRON-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298950#comment-16298950
 ] 

ASF GitHub Bot commented on METRON-1348:
----------------------------------------

Github user anandsubbu commented on the issue:

    https://github.com/apache/metron/pull/864
  
    Hi @nickwallen , I tried this on a 12-node cluster. I validated that 
`clusterHostInfo` is populated properly for the alerts_ui, management_ui and 
rest_ui hosts. 
    
    However, in my case it failed on the parser service check since the 'Metron 
Check' step landed on a host without Kafka broker installed.
    
    Here's the error excerpt:
    ```
    <snip>
    2017-12-20 18:42:54,285 - Performing Parser service check
    2017-12-20 18:42:54,285 - Checking for grok patterns in HDFS for Parsers
    2017-12-20 18:42:54,285 - Checking HDFS; directory=/apps/metron/patterns 
user=metron
    2017-12-20 18:42:54,285 - Execute['/usr/hdp/2.5.3.0-37/hadoop/bin/hdfs dfs 
-test -d /apps/metron/patterns'] {'logoutput': True, 'path': 
['/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin'], 'tries': 3, 'user': 'metron', 
'try_sleep': 5}
    2017-12-20 18:42:56,822 - Checking Kafka topics for Parsers
    2017-12-20 18:42:56,822 - Checking existence of Kafka topic 'bro'
    2017-12-20 18:42:56,823 - 
Execute['/usr/hdp/current/kafka-broker/bin/kafka-topics.sh       --zookeeper 
metronc-1.openstacklocal:2181,metronc-11.openstacklocal:2181,metronc-10.openstacklocal:2181
       --list |       awk 'BEGIN {cnt=0;} /bro/ {cnt++} END {if (cnt > 0) {exit 
0} else {exit 1}}''] {'logoutput': True, 'path': 
['/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin'], 'tries': 3, 'user': 'kafka', 
'try_sleep': 5}
    -bash: /usr/hdp/current/kafka-broker/bin/kafka-topics.sh: No such file or 
directory
    2017-12-20 18:42:56,900 - Retrying after 5 seconds. Reason: Execution of 
'/usr/hdp/current/kafka-broker/bin/kafka-topics.sh       --zookeeper 
metronc-1.openstacklocal:2181,metronc-11.openstacklocal:2181,metronc-10.openstacklocal:2181
       --list |       awk 'BEGIN {cnt=0;} /bro/ {cnt++} END {if (cnt > 0) {exit 
0} else {exit 1}}'' returned 1. -bash: 
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh: No such file or directory
    -bash: /usr/hdp/current/kafka-broker/bin/kafka-topics.sh: No such file or 
directory
    2017-12-20 18:43:01,987 - Retrying after 5 seconds. Reason: Execution of 
'/usr/hdp/current/kafka-broker/bin/kafka-topics.sh       --zookeeper 
metronc-1.openstacklocal:2181,metronc-11.openstacklocal:2181,metronc-10.openstacklocal:2181
       --list |       awk 'BEGIN {cnt=0;} /bro/ {cnt++} END {if (cnt > 0) {exit 
0} else {exit 1}}'' returned 1. -bash: 
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh: No such file or directory
    -bash: /usr/hdp/current/kafka-broker/bin/kafka-topics.sh: No such file or 
directory
    
    Command failed after 1 tries
    <snip>
    ```
    
    I noticed that the `clusterHostInfo` indeed has a list of the 
`kafka_broker_hosts` (see attached
    
[clusterHostInfo-12-node.txt](https://github.com/apache/metron/files/1576752/clusterHostInfo-12-node.txt).
 Would it be possible to either a) force Ambari to run metron service check on 
one of the kafka broker hosts; or b) run 
[check_kafka_topics](https://github.com/apache/metron/blob/master/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/metron_service.py#L259)
 on a `kafka_broker_host`
    
    I am perfectly fine if you think the kafka_broker fix should be a different 
PR than this.


> Metron Service Checks Use Wrong Hostname
> ----------------------------------------
>
>                 Key: METRON-1348
>                 URL: https://issues.apache.org/jira/browse/METRON-1348
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>
> The Metron service check can often use the incorrect hostname when checking 
> the Alerts UI, Management UI, and REST services.  
> Ambari can run the service check on any node in the cluster, not just the 
> node the service is actually running on.  The service check code currently 
> uses the hostname on which the service check is running.  If the service is 
> not actually installed on that host, the service check will incorrectly fail.
> The service check code should be updated to find the hostname where the 
> service is installed and use that hostname.  
> For example, here is a log of a service check that is looking on the wrong 
> host for the Metron REST service.
> {code}
> 2017-12-08 17:11:30,433 - Checking connectivity to REST application
> 2017-12-08 17:11:30,434 - Checking HTTP connectivity; 
> host=hcpua-10.openstacklocal, port=8082, user=metron cmd=curl -sS --max-time 
> 3 hcpua-10.openstacklocal:8082
> 2017-12-08 17:11:30,434 - Execute['curl -sS --max-time 3 
> hcpua-10.openstacklocal:8082'] {'logoutput': False, 'tries': 3, 'user': 
> 'metron', 'try_sleep': 5}
> 2017-12-08 17:11:30,471 - Retrying after 5 seconds. Reason: Execution of 
> 'curl -sS --max-time 3 hcpua-10.openstacklocal:8082' returned 7. curl: (7) 
> Failed to connect to hcpua-10.openstacklocal port 8082: Connection refused
> 2017-12-08 17:11:35,519 - Retrying after 5 seconds. Reason: Execution of 
> 'curl -sS --max-time 3 hcpua-10.openstacklocal:8082' returned 7. curl: (7) 
> Failed to connect to hcpua-10.openstacklocal port 8082: Connection refused
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to