[
https://issues.apache.org/jira/browse/AMBARI-5800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002584#comment-14002584
]
Sudhir Prakash commented on AMBARI-5800:
----------------------------------------
Right now my HiveServer2 is taking ~15 seconds to start and I am not seeing
this issue, previously it was taking 55 seconds and I was seeing this issue. I
think the issue is that the Hive Check script needs to wait longer before
failing out.
{code}
Test connectivity to hive server
Successfully connected to byn001-2 on port 10000
2014-05-19 14:37:29,157 - File['/tmp/hcatSmoke.sh'] {'content':
StaticFile('hcatSmoke.sh'), 'mode': 0755}
2014-05-19 14:37:29,158 - Execute['env JAVA_HOME=/opt/teradata/jvm64/jdk7
/tmp/hcatSmoke.sh hcatsmokeid0x270508_date371914 prepare'] {'logoutput': True,
'path': ['/usr/sbin', '/usr/local/nin', '/bin', '/usr/bin'], 'tries': 3,
'user': 'ambari-qa', 'try_sleep': 5}
{code}
Is the service check waiting 3 x 5 = 15 seconds before exiting out? If so, this
needs to change to a much larger value to accommodate various performance
characteristics for servers.
> Race condition when starting all services causing Hive service check to fail
> ----------------------------------------------------------------------------
>
> Key: AMBARI-5800
> URL: https://issues.apache.org/jira/browse/AMBARI-5800
> Project: Ambari
> Issue Type: Bug
> Affects Versions: 1.6.0
> Environment: SLES11
> ambari-server-1.6.0-39
> hive-0.13.0.2.1.2.0-402
> Reporter: Sudhir Prakash
> Priority: Critical
>
> # I performed an install on a 7 node cluster
> # During the install, I noticed that the Hive service check failed with the
> error: {{Test connectivity to hive server Connection to byn001-1 on port
> 10000 failed: [Errno 111] Connection refused}}
> # I proceeded through the rest of the install wizard
> # Stop All
> # Start All and noticed the same error again
> I retried stop all/start all this time monitoring the Ambari start progess,
> the Hive Server2 logs, and a netstat of port 10000. What I noticed is that
> immediately after the start Hive is issued, the service check is run and
> fails. However, it takes about 55 seconds for HiveServer2 to actually start
> and claim port 10000.
> The start up sequence needs to be modified to wait for Hive to finish
> starting before running the service check.
> This issue is easily reproducible and has been seen by multiple people there.
--
This message was sent by Atlassian JIRA
(v6.2#6252)