I will dig in Monday James. If a cluster restart then deleting state up in zk is fine. The restart will run w/o previous state. Deleting state from zk is bad if a running cluster. It will more than likely mess it up as regions in transition kept up in zk are erased
Stack On Jan 14, 2011, at 10:52, James Kennedy <james.kenn...@troove.net> wrote: > Negative. I deleted the zookeeper dir and HMaser still managed to pull the > wrong IP address from somewhere. > > I don't have a lot of time to really investigate this myself but I'll try to > reproduce it with a basic test and log a case for it. > > By the way, can someone clarify the side-effects of deleting the zookeeper > dir like that? I assume it has no ill effect on the data itself especially > when the cluster is down. But what is the worst that can happen if you delete > the dir while the cluster is running? > > Thanks > > James > > On 2011-01-14, at 9:54 AM, Stack wrote: > >> It does seem like a regression. If u kill the zk data dir and restart the >> cluster does it work? (root location is up in zk) >> >> >> Stack >> >> >> >> On Jan 13, 2011, at 11:37, James Kennedy <james.kenn...@troove.net> wrote: >> >>> I'm currently validating the new 0.90.0 RC3 with the hbase-trx layer and >>> our own application. >>> >>> All seems well so far except for the fact that I now find that HBase >>> doesn't adapt if I try to run the same data on different machines. >>> >>> e.g. >>> 1) I work from home and generated our seeded test data. >>> 2) Run the test suite and all tests pass >>> 3) I go to the office and re-run the tests. >>> >>> Result: HMaster fails because the .ROOT data has the wrong ip address for >>> locating the .META. At least that is my understanding from the stacktrace >>> below. Note that the 192.168.1.102 IP address in that trace is the IP from >>> my home network and is incorrect. >>> >>> This wasn't an issue with previous versions of HBase as far as I've >>> noticed. And this seems to be a big data portability fail. >>> Surely the HMaster should be able to absorb stale metadata and wait for new >>> region-servers to check in. >>> Instead it just keels over and dies. >>> But before logging a case I wanted to know if there was something I'm >>> obviously missing or doing wrong. >>> >>> The seeded test data is on HDFS. >>> >>> Thoughts? >>> >>> >>> [13/01/11 10:58:42] 5939 [ main] INFO >>> ion.service.HBaseRegionService - troove> Starting region server thread. >>> [13/01/11 11:00:15] 98699 [ HMaster] FATAL >>> he.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. >>> java.net.SocketTimeoutException: 20000 millis timeout while waiting for >>> channel to be ready for connect. ch : >>> java.nio.channels.SocketChannel[connection-pending >>> remote=192.168.1.102/192.168.1.102:60020] >>> at >>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213) >>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) >>> at >>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311) >>> at >>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865) >>> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732) >>> at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258) >>> at $Proxy15.getProtocolVersion(Unknown Source) >>> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419) >>> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393) >>> at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444) >>> at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349) >>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954) >>> at >>> org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384) >>> at >>> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283) >>> at >>> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478) >>> at >>> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435) >>> at >>> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382) >>> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277) >>> at java.lang.Thread.run(Thread.java:680) >>> >>> >>> James Kennedy >>> Troove Inc. >>> >>> >