Dear Guys:
Recently we compile impala using our development environment and when we run
the complied impala, we met the following problem.
Problem: Impala runs successfully if we do not reboot our machine. However,
when we reboot the machine, we cannot restart the impala process. We try a lot
of machines, the problem occurs on every machine.
We struggle for a long time , but it still does not work. We are wondering
whether you guys can help us to solve the problem.
The environment and error message is as follows.
environment<javascript:void(0);>:
OS: Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
Kernel:Linux version 3.10.0-327.28.2.el7.x86_64
Impala version: cdh5-trunk
1. We start Impala using: ${IMPALA_HOME}/testdata/bin/run-all.sh, and get
the following message.
[root@localhost rtap-on-impala]# ${IMPALA_HOME}/testdata/bin/run-all.sh
Killing running services...
Starting all cluster services...
--> Starting mini-DFS cluster
Stopping kms
Stopping llama
Stopping yarn
Stopping hdfs
Starting hdfs (Web UI - http://localhost:5070)
....Namenode started
Starting yarn (Web UI - http://localhost:8088)
Starting llama (Web UI - http://localhost:1501)
Starting kms (Web UI - http://localhost:16000)
The cluster is running
--> Starting HBase
localhost: starting zookeeper, logging to
/home/linxiaoyong/impala_new/rtap-on-impala/impala/cluster_logs/hbase/hbase-root-zookeeper-localhost.localdomain.out
starting master, logging to
/home/linxiaoyong/impala_new/rtap-on-impala/impala/cluster_logs/hbase/hbase-root-master-localhost.localdomain.out
16/09/28 17:15:52 INFO util.VersionInfo: HBase 1.2.0-cdh5.8.0-SNAPSHOT
16/09/28 17:15:52 INFO util.VersionInfo: Source code repository
file:///var/lib/jenkins/workspace/generic-binary-tarball-and-maven-deploy/CDH5-Packaging-HBase-2016-02-24_17-14-20/hbase-1.2.0-cdh5.8.0-SNAPSHOT
revision=Unknown
16/09/28 17:15:52 INFO util.VersionInfo: Compiled by jenkins on Wed Feb 24
17:26:12 PST 2016
16/09/28 17:15:52 INFO util.VersionInfo: From source with checksum
2c2f0626ababf9b47e88728c663df5c7
Waiting for HBase Master
...........................Failure
Hbase master did NOT write /hbase/rs in 30.4s
Error in
/home/linxiaoyong/impala_new/rtap-on-impala/impala/testdata/bin/run-hbase.sh at
line 87: ${CLUSTER_BIN}/wait-for-hbase-master.py
Error in
/home/linxiaoyong/impala_new/rtap-on-impala/impala/testdata/bin/run-all.sh at
line 48: tee ${IMPALA_TEST_CLUSTER_LOG_DIR}/run-hbase.log
2. Vim cluster_logs/hbase/hbase-root-master-localhost.localdomain.out
Errors follow as:
16/09/28 17:16:10 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
(unknown error)
16/09/28 17:16:10 WARN zookeeper.ClientCnxn: Session 0x0 for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
16/09/28 17:16:11 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using
SASL (unknown error)
16/09/28 17:16:11 WARN zookeeper.ClientCnxn: Session 0x0 for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
16/09/28 17:16:11 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL
(unknown error)
16/09/28 17:16:11 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper create failed
after 4 attempts
16/09/28 17:16:11 WARN zookeeper.ClientCnxn: Session 0x0 for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
16/09/28 17:16:11 ERROR master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Failed construction of Master: class
org.apache.hadoop.hbase.master.HMaster.
at
org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2428)
at
org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:232)
at
org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2438)
Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException:
master:600000x0, quorum=localhost:2181, baseZNode=/hbase Unexpected
KeeperException creating base node
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:206)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:187)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:590)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:375)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at
org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2421)
... 5 more
I used “jps” to watch the processes like as:
[root@localhost rtap-on-impala]# jps
26528 LlamaAMMain
25921 NodeManager
25186 DataNode
25890 NodeManager
29188 Jps
25221 DataNode
25864 NodeManager
25162 DataNode
26635 Bootstrap
14194 -- process information unavailable
25246 NameNode
25950 ResourceManager
27423 HQuorumPeer