[ 
https://issues.apache.org/jira/browse/AMBARI-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002584#comment-14002584
 ] 

Sudhir Prakash commented on AMBARI-5800:
----------------------------------------

Right now my HiveServer2 is taking ~15 seconds to start and I am not seeing 
this issue, previously it was taking 55 seconds and I was seeing this issue. I 
think the issue is that the Hive Check script needs to wait longer before 
failing out.

{code}
Test connectivity to hive server
Successfully connected to byn001-2 on port 10000
2014-05-19 14:37:29,157 - File['/tmp/hcatSmoke.sh'] {'content': 
StaticFile('hcatSmoke.sh'), 'mode': 0755}
2014-05-19 14:37:29,158 - Execute['env JAVA_HOME=/opt/teradata/jvm64/jdk7 
/tmp/hcatSmoke.sh hcatsmokeid0x270508_date371914 prepare'] {'logoutput': True, 
'path': ['/usr/sbin', '/usr/local/nin', '/bin', '/usr/bin'], 'tries': 3, 
'user': 'ambari-qa', 'try_sleep': 5}
{code}

Is the service check waiting 3 x 5 = 15 seconds before exiting out? If so, this 
needs to change to a much larger value to accommodate various performance 
characteristics for servers.

> Race condition when starting all services causing Hive service check to fail
> ----------------------------------------------------------------------------
>
>                 Key: AMBARI-5800
>                 URL: https://issues.apache.org/jira/browse/AMBARI-5800
>             Project: Ambari
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>         Environment: SLES11
> ambari-server-1.6.0-39
> hive-0.13.0.2.1.2.0-402
>            Reporter: Sudhir Prakash
>            Priority: Critical
>
> # I performed an install on a 7 node cluster
> # During the install, I noticed that the Hive service check failed with the 
> error: {{Test connectivity to hive server Connection to byn001-1 on port 
> 10000 failed: [Errno 111] Connection refused}}
> # I proceeded through the rest of the install wizard
> # Stop All
> # Start All and noticed the same error again
> I retried stop all/start all this time monitoring the Ambari start progess, 
> the Hive Server2 logs, and a netstat of port 10000. What I noticed is that 
> immediately after the start Hive is issued, the service check is run and 
> fails. However, it takes about 55 seconds for HiveServer2 to actually start 
> and claim port 10000. 
> The start up sequence needs to be modified to wait for Hive to finish 
> starting before running the service check.
> This issue is easily reproducible and has been seen by multiple people there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to