Hi.
I've upgraded to 0.20.0-rc3. That resolved the issues I was seeing with
0.19.1. Things seem to be working, but I did run into one problem. I have
a map/reduce job that uses HBase and I see this error (stack traces below)
when I start the job on one of the nodes, but if I start it on a different
node, it runs successfully.
The node that doesn't work is not running a ZK instance, but the node that
does work is. I also see from the stack trace that the zookeeper client is
trying to connect to 127.0.0.1. Clearly, that's going to fail. But why is
it trying localhost at all? My hbase-site.xml lists 5 quorum servers by IP
address, and this config file is in zookeeper's path (see the last entry in
the classpath log statement below).
Marc
...
2009-09-09 09:08:14,991 INFO zookeeper.ZooKeeper
(Environment.java:logEnv(97)) - Client
environment:java.class.path=/opt/feeva
/subscriber-db/bin/subscriber-db.jar:lib/cascading-core-1.0.10.jar:lib/cascading-test-1.0.10.jar:lib/cascading-xml-1.0.10.jar:l
ib/commons-beanutils-1.8.0.jar:lib/commons-cli-1.2.jar:lib/commons-codec-1.3.jar:lib/commons-collections-3.2.1.jar:lib/commons-
dbcp-1.2.2.jar:lib/commons-httpclient-3.0.1.jar:lib/commons-lang-2.4.jar:lib/commons-logging-1.1.1.jar:lib/commons-logging-api-
1.1.1.jar:lib/commons-net-1.4.1.jar:lib/commons-pool-1.4.jar:lib/common-util-1.6.2.jar:lib/common-util-1.7.0.jar:lib/ezmorph-1.
0.6.jar:lib/groovy-all-1.6.2.jar:lib/hadoop-core-0.20.0.jar:lib/hbase-0.20.0-rc3.jar:lib/janino-2.5.15.jar:lib/jgrapht-jdk1.6-1
.0.10.jar:lib/json-lib-2.2.3-jdk15.jar:lib/junit-4.6.jar:lib/log4j-1.2.15.jar:lib/mysql-connector-java-5.1.7-bin.jar:lib/ops-db
-1.0.4.jar:lib/ops-db-1.2.1.jar:lib/oro-2.0.8.jar:lib/postgresql-8.3-604.jdbc4.jar:lib/slf4j-api-1.4.3.jar:lib/slf4j-log4j12-1.
4.3.jar:lib/subscriber-db-data-1.0.0.jar:lib/subscriber-db-data-1.1.0.jar:lib/zookeeper-r785019-hbase-1329.jar:/usr/local/hbase
/conf/hbase-site.xml
...
2009-09-09 09:08:15,017 WARN zookeeper.ClientCnxn
(ClientCnxn.java:cleanup(958)) - Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:956)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922)
2009-09-09 09:08:15,125 INFO client.HConnectionManager$TableServers
(HConnectionManager.java:getMaster(331)) - getMaster attempt 0 of 10 failed;
retrying after sleep of 2000
java.io.IOException:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240)
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:315)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:72)
at [... my code...]
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:750)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327)
... 38 more
2009-09-09 09:08:16,421 INFO zookeeper.ClientCnxn
(ClientCnxn.java:startConnect(821)) - Attempting connection to server
localhost/127.0.0.1:2181
2009-09-09 09:08:16,422 WARN zookeeper.ClientCnxn
(ClientCnxn.java:run(919)) - Exception closing session 0x0 to
sun.nio.ch.selectionkeyi...@4aab7165
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:885)
...
And finally...
Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException
at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:347)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:72)
at [...my code...]