[
https://issues.apache.org/jira/browse/IMPALA-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Smith resolved IMPALA-13132.
------------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
> Ozone jobs see intermittent termination of Ozone manager / HMS fails to start
> -----------------------------------------------------------------------------
>
> Key: IMPALA-13132
> URL: https://issues.apache.org/jira/browse/IMPALA-13132
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: Impala 4.5.0
> Reporter: Joe McDonnell
> Assignee: Michael Smith
> Priority: Critical
> Labels: broken-build, flaky
> Fix For: Impala 4.5.0
>
>
> Ozone jobs load data/metadata snapshots during dataload, then restarts the
> cluster. On this restart, the HMS sometimes fails to come up:
> {noformat}
> 16:04:13 --> Starting Hive Metastore Service
> 16:04:13 No handlers could be found for logger "thrift.transport.TSocket"
> 16:04:14 Waiting for the Metastore at localhost:9083...
> ...
> 16:09:14 Waiting for the Metastore at localhost:9083...
> 16:09:14 Metastore service failed to start within 300.0 seconds.{noformat}
> In the metastore logs, we see messages like this:
> {noformat}
> 2024-06-04T08:37:06,425 INFO [main] retry.RetryInvocationHandler:
> com.google.protobuf.ServiceException: java.net.ConnectException: Call From
> hostname/127.0.0.1 to localhost:9862 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking
> $Proxy31.submitRequest over nodeId=null,nodeAddress=localhost:9862 after 1
> failover attempts. Trying to failover after sleeping for 4000ms.{noformat}
> It's trying to talk to the Ozone manager. The Ozone cluster was back up and
> running before trying to start the HMS, but then the Ozone manager received a
> signal and shutdown:
> {noformat}
> 24/06/04 08:36:37 ERROR om.OzoneManagerStarter: RECEIVED SIGNAL 15: SIGTERM
> 24/06/04 08:36:37 INFO om.OzoneManagerStarter: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down OzoneManager at hostname/127.0.0.1
> ************************************************************/
> 24/06/04 08:36:37 INFO om.OzoneManager: om1[localhost:9862]: Stopping Ozone
> Manager{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)