Thanks, Jean. My bad. I will try with the private hostnames later. I believe under EC2 they look something like this....
domU-12-31-38-00-9D-E3 On Fri, Dec 4, 2009 at 12:41 PM, Patrick Hunt <[email protected]> wrote: > I'm not familiar with ec2, when you say "listen on private hostname" what > does that mean? Do you mean "by default listen on an interface with a > non-routable (localonly) ip"? Or something else. Is there an aws page you > can point me to? > > Patrick > > > Jean-Daniel Cryans wrote: > >> When you saw: >> >> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete >> /ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode. >> The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. >> *Safe >> mode will be turned off automatically*. >> >> It means that HDFS is blocking everything (aka safe mode) until all >> datanodes reported for duty (and then it waits for 30 seconds to make >> sure). >> >> When you saw: >> >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >> KeeperErrorCode = *NoNode for /hbase/master* >> >> It means that the Master node didn't write his znode in Zookeeper >> because... when you saw: >> >> 2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x0 to sun.nio.ch.selectionkeyi...@10e35d5 >> java.net.ConnectException: Connection refused >> >> It really means that the connection was refused. It then says it >> attempted to connect to ec2-174-129-127-141.compute-1.amazonaws.com >> but wasn't able to. AFAIK in EC2 the java processes tend to listen on >> their private hostname not the public one (which would be bad >> anyways). >> >> Bottom line, make sure stuff listens where they are expected and it >> should then work well. >> >> J-D >> >> On Fri, Dec 4, 2009 at 11:23 AM, Something Something >> <[email protected]> wrote: >> >>> Hadoop: 0.20.1 >>> >>> HBase: 0.20.2 >>> >>> Zookeeper: The one which gets started by default by HBase. >>> >>> >>> HBase logs: >>> >>> 1) Master log shows this WARN message, but then it says 'connection >>> successful' >>> >>> >>> 2009-12-04 07:07:37,149 WARN org.apache.zookeeper.ClientCnxn: Exception >>> closing session 0x0 to sun.nio.ch.selectionkeyi...@10e35d5 >>> java.net.ConnectException: Connection refused >>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>> at >>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933) >>> 2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>> exception during shutdown input >>> java.nio.channels.ClosedChannelException >>> at >>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) >>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>> 2009-12-04 07:07:37,150 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>> exception during shutdown output >>> java.nio.channels.ClosedChannelException >>> at >>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) >>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>> 2009-12-04 07:07:37,199 INFO >>> org.apache.hadoop.hbase.master.RegionManager: >>> -ROOT- region unset (but not set to be reassigned) >>> 2009-12-04 07:07:37,200 INFO >>> org.apache.hadoop.hbase.master.RegionManager: >>> ROOT inserted into regionsInTransition >>> 2009-12-04 07:07:37,667 INFO org.apache.zookeeper.ClientCnxn: Attempting >>> connection to server >>> ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181 >>> 2009-12-04 07:07:37,668 INFO org.apache.zookeeper.ClientCnxn: Priming >>> connection to java.nio.channels.SocketChannel[connected local=/ >>> 10.252.162.19:46195 remote= >>> ec2-174-129-127-141.compute-1.amazonaws.com/10.252.146.65:2181] >>> 2009-12-04 07:07:37,670 INFO org.apache.zookeeper.ClientCnxn: Server >>> connection successful >>> >>> >>> >>> 2) Regionserver log shows this... but later seems to have recovered: >>> >>> 2009-12-04 07:07:36,576 WARN org.apache.zookeeper.ClientCnxn: Exception >>> closing session 0x0 to sun.nio.ch.selectionkeyi...@4ee70b >>> java.net.ConnectException: Connection refused >>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >>> at >>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:933) >>> 2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>> exception during shutdown input >>> java.nio.channels.ClosedChannelException >>> at >>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) >>> at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>> 2009-12-04 07:07:36,611 WARN org.apache.zookeeper.ClientCnxn: Ignoring >>> exception during shutdown output >>> java.nio.channels.ClosedChannelException >>> at >>> sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) >>> at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) >>> at >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>> 2009-12-04 07:07:36,742 WARN >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher >>> on >>> ZNode /hbase/master >>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>> KeeperErrorCode = ConnectionLoss for /hbase/master >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:780) >>> at >>> >>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:304) >>> at >>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:385) >>> at >>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315) >>> at >>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306) >>> at >>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276) >>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>> Method) >>> at >>> >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) >>> at >>> >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >>> at java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>> at >>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.doMain(HRegionServer.java:2474) >>> at >>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2542) >>> 2009-12-04 07:07:36,743 WARN >>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher >>> on >>> ZooKeeper master address. Retrying. >>> >>> >>> >>> 3) Zookeepr log: Nothing much in there... just a starting message >>> line.. >>> followed by >>> >>> ulimit -n 1024 >>> >>> I looked at archives. There was one mail that talked about 'ulimit'. >>> Wonder if that has something to do with it. >>> >>> Thanks for your help. >>> >>> >>> >>> On Fri, Dec 4, 2009 at 8:18 AM, Mark Vigeant >>> <[email protected]>wrote: >>> >>> When I first started my hbase cluster, it too gave me the nonode for >>>> /hbase/master several times before it started working, and I believe >>>> this is >>>> a common beginner's error (I've seen it in a few emails in the past 2 >>>> weeks). >>>> >>>> What versions of HBase, Hadoop and ZooKeeper are you using? >>>> >>>> Also, take a look in your HBASE_HOME/logs folder. That would be a good >>>> place to start looking for some answers. >>>> >>>> -Mark >>>> >>>> -----Original Message----- >>>> From: Something Something [mailto:[email protected]] >>>> Sent: Friday, December 04, 2009 2:28 AM >>>> To: [email protected] >>>> Subject: Starting HBase in fully distributed mode... >>>> >>>> Hello, >>>> >>>> I am trying to get Hadoop/HBase up and running in a fully distributed >>>> mode. >>>> For now, I have only *1 Master & 2 Slaves*. >>>> >>>> The Hadoop starts correctly.. I think. The only exception I see in >>>> various >>>> log files is this one... >>>> >>>> >>>> org.apache.hadoop.ipc.RemoteException: >>>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete >>>> /ebs1/mapred/system,/ebs2/mapred/system. Name node is in safe mode. >>>> The ratio of reported blocks 0.0000 has not reached the threshold >>>> 0.9990. >>>> *Safe >>>> mode will be turned off automatically*. >>>> at >>>> >>>> >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1696) >>>> at >>>> >>>> >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1676) >>>> at >>>> >>>> org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> >>>> >>>> Somehow this doesn't sound critical, so I assumed everything was good to >>>> go >>>> with Hadoop. >>>> >>>> >>>> So then I started HBase and opened a shell (hbase shell). So far >>>> everything >>>> looks good. Now when I try to run a 'list' command, I keep getting this >>>> message: >>>> >>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>>> KeeperErrorCode = *NoNode for /hbase/master* >>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >>>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) >>>> at >>>> >>>> >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:328) >>>> >>>> >>>> Here's what I have in my *Master hbase-site.xml* >>>> >>>> <configuration> >>>> <property> >>>> <name>hbase.rootdir</name> >>>> <value>hdfs://master:54310/hbase</value> >>>> </property> >>>> <property> >>>> <name>hbase.cluster.distributed</name> >>>> <value>true</value> >>>> </property> >>>> <property> >>>> <name>hbase.zookeeper.property.clientPort</name> >>>> <value>2181</value> >>>> </property> >>>> <property> >>>> <name>hbase.zookeeper.quorum</name> >>>> <value>master,slave1,slave2</value> >>>> </property> >>>> <property> >>>> >>>> >>>> >>>> The *Slave *hbase-site.xml are set as follows: >>>> >>>> <property> >>>> <name>hbase.rootdir</name> >>>> <value>hdfs://master:54310/hbase</value> >>>> </property> >>>> <property> >>>> <name>hbase.cluster.distributed</name> >>>> <value>false</value> >>>> </property> >>>> <property> >>>> <name>hbase.zookeeper.property.clientPort</name> >>>> <value>2181</value> >>>> </property> >>>> >>>> >>>> In the hbase-env.sh file on ALL 3 machines I have set the JAVA_HOME and >>>> set >>>> the HBase classpath as follows: >>>> >>>> export HBASE_CLASSPATH=$HBASE_CLASSPATH:/ebs1/hadoop-0.20.1/conf >>>> >>>> >>>> On *Master* I have added Master & Slaves IP hostnames to *regionservers* >>>> file. >>>> On *slaves*, the regionservers file is empty. >>>> >>>> >>>> I have run hadoop namenode -format multiple times, but still keep >>>> getting.. >>>> "NoNode for /hbase/master". What step did I miss? Thanks for your >>>> help. >>>> >>>> This email message and any attachments are for the sole use of the >>>> intended >>>> recipients and may contain proprietary and/or confidential information >>>> which >>>> may be privileged or otherwise protected from disclosure. Any >>>> unauthorized >>>> review, use, disclosure or distribution is prohibited. If you are not an >>>> intended recipient, please contact the sender by reply email and destroy >>>> the >>>> original message and any copies of the message as well as any >>>> attachments to >>>> the original message. >>>> >>>>
