Hi Artyom, >> I'm not totally sure, but I think this exception occurs when there is >> no HDFS data node available in the cluster. >> >> Can you access to the HDFS name node status screen at >> <http://servers-ip:50070/> from a web browser to see if there is a >> data node available?
> Yes, the HDFS name node status is accessible and data node is available > through a web browser using url <http://servers-ip:50070/>. > > Could you provide some examples when data node does not available in the > cluster and for the HBase master? I happen to have an Ubuntu 9.04 virtual server installation, so I set up HDFS on it to see if I can reproduce the exception you had. And I found I can easily reproduce this by the following steps: 1. Delete hadoop data directory 2. bin/hadoop namenode -format 3. bin/start-all.sh -> namenode will start immediately and go in service, but data node will be making a long (almost seven minutes) pause in a middle of the startup. 4. Before the data node becomes ready, do an HDFS write operation (e.g. "bin/hadoop fs -put conf input"), and then the write operations will fail with the following error: ------------------------------------------------ 09/10/28 09:00:19 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/tatsuya/input/capacity-scheduler.xml could only be replicated to 0 nodes, instead of 1 ... 09/10/28 09:00:19 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 09/10/28 09:00:19 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/tatsuya/input/capacity-scheduler.xml" - Aborting... ------------------------------------------------ This doesn't seem to be a desired behavior of HDFS; shouldn't HDFS be in the safe mode while data node is not ready? Also, if I skip step #1 and 2, the problem doesn't happen. The data node still does the long pause at startup, but HDFS cluster will start in the safe mode and wait for the data node to become ready. HBase deals with HDFS safe mode, so HBase should work fine in this case. Can you check if this is your case? If so, you can avoid this by not running "start-hbase.sh" until HDFS has the data nodes available. I have done a little more investigation why the data node makes the long pause on Ubuntu 9.04. It seems there is a problem with SUN JRE SecureRandom implementation on Linux, and this causes Jetty (used in the data node) to slow down to create its session ID manager. Here is the data node log, with a seven-minute pause while it's trying to start Jetty. ------------------------------------------------ 2009-10-28 09:00:10,559 INFO org.mortbay.log: jetty-6.1.14 2009-10-28 09:06:54,165 INFO org.mortbay.log: Started [email protected]:50075 ------------------------------------------------ Here is a part of a full thread dump; sun.security.provider.SecureRandom is taking long time (forever?) to finish. ------------------------------------------------ "main" prio=10 tid=0x00000000409a8800 nid=0xba2 runnable [0x00007ff762a32000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) ... - locked <0x00007ff749edfbb8> (a java.io.BufferedInputStream) at sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:453) at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:123) at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:118) at sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114) at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171) - locked <0x00007ff749edf388> (a sun.security.provider.SecureRandom) at java.security.SecureRandom.nextBytes(SecureRandom.java:433) - locked <0x00007ff749edf6b8> (a java.security.SecureRandom) at java.security.SecureRandom.next(SecureRandom.java:455) at java.util.Random.nextLong(Random.java:284) at org.mortbay.jetty.servlet.HashSessionIdManager.doStart(HashSessionIdManager.java:139) ... at org.apache.hadoop.http.HttpServer.start(HttpServer.java:460) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:375) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) ------------------------------------------------ And I found this is a known issue on Jetty: http://jira.codehaus.org/browse/JETTY-331 It says you could workaround by changing Jetty setting to use "java.util.Random" instead of "sun.security.provider.SecureRandom". I don't know if this is a correct way to workaround. I'd better ask HDFS folks at hdfs-user mailing list for a solution. (I'm currently not a member of the mailing list.) Hope this helps, -- Tatsuya Kawano (Mr.) Tokyo, Japan On Wed, Oct 28, 2009 at 7:12 AM, Artyom Shvedchikov <[email protected]> wrote: > Hello, Tatsuya > Thank you for the fast assistance. > > I'm not totally sure, but I think this exception occurs when there is >> no HDFS data node available in the cluster. >> >> Can you access to the HDFS name node status screen at >> <http://servers-ip:50070/> from a web browser to see if there is a >> data node available? >> > > Yes, the HDFS name node status is accessible and data node is available > through a web browser using url <http://servers-ip:50070/>. > > Could you provide some examples when data node does not available in the > cluster and for the HBase master? > ------------------------------------------------- > Best wishes, Artyom Shvedchikov > > > On Tue, Oct 27, 2009 at 10:01 AM, Tatsuya Kawano > <[email protected]>wrote: > >> Hi Artyom, >> >> Your configuration files look just fine. >> >> >> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: >> datastreamer >> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: >> file >> >> /hbase.version could only be replicated to 0 nodes, instead of 1 >> >> I'm not totally sure, but I think this exception occurs when there is >> no HDFS data node available in the cluster. >> >> Can you access to the HDFS name node status screen at >> <http://servers-ip:50070/> from a web browser to see if there is a >> data node available? >> >> Thanks, >> >> -- >> Tatsuya Kawano (Mr.) >> Tokyo, Japan >> >> >> On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <[email protected]> >> wrote: >> > Hello. >> > >> > We are testing the latest HBase 0.20.1 in pseudo-distributed mode with >> > Hadoop 0.20.1 on such environment: >> > *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200 >> Rpm >> > *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java 1.6.0_16-b01, Hadoop >> > 0.20.1, HBase 0.20.1 >> > >> > File */etc/hosts* >> > >> >> 127.0.0.1 localhost >> >> >> >> # The following lines are desirable for IPv6 capable hosts >> >> ::1 localhost ip6-localhost ip6-loopback >> >> fe00::0 ip6-localnet >> >> ff00::0 ip6-mcastprefix >> >> ff02::1 ip6-allnodes >> >> ff02::2 ip6-allrouters >> >> ff02::3 ip6-allhosts >> >> >> > Hadoop and HBase are running in pseudo-distributed mode: >> > Two options added to *hadoop-env.sh*: >> > >> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun >> >> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true >> >> >> > *core-site.xml*: >> > >> >> <configuration> >> >> <property> >> >> <name>fs.default.name</name> >> >> <value>hdfs://127.0.0.1:9000</value> >> >> </property> >> >> <property> >> >> <name>hadoop.tmp.dir</name> >> >> <value>/hadoop/tmp/hadoop-${user.name}</value> >> >> <description>A base for other temporary directories.</description> >> >> </property> >> >> </configuration> >> >> >> > *hdfs-site.xml*: >> > >> >> <configuration> >> >> <property> >> >> <name>dfs.replication</name> >> >> <value>1</value> >> >> </property> >> >> <property> >> >> <name>dfs.name.dir</name> >> >> <value>/hadoop/hdfs/name</value> >> >> </property> >> >> <property> >> >> <name>dfs.data.dir</name> >> >> <value>/hadoop/hdfs/data</value> >> >> </property> >> >> <property> >> >> <name>dfs.datanode.socket.write.timeout</name> >> >> <value>0</value> >> >> </property> >> >> <property> >> >> <name>dfs.datanode.max.xcievers</name> >> >> <value>1023</value> >> >> </property> >> >> </configuration> >> >> >> > *marped-site.xml:* >> > >> >> <configuration> >> >> <property> >> >> <name>mapred.job.tracker</name> >> >> <value>127.0.0.1:9001</value> >> >> </property> >> >> </configuration> >> >> >> > *hbase-site.xml:* >> > >> >> <configuration> >> >> <property> >> >> <name>hbase.rootdir</name> >> >> <value>hdfs://localhost:9000/</value> >> >> <description>The directory shared by region servers. >> >> Should be fully-qualified to include the filesystem to use. >> >> E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR >> >> </description> >> >> </property> >> >> <property> >> >> <name>hbase.master</name> >> >> <value>127.0.0.1:60000</value> >> >> <description>The host and port that the HBase master runs at. >> >> </description> >> >> </property> >> >> <property> >> >> <name>hbase.tmp.dir</name> >> >> <value>/hadoop/tmp/hbase-${user.name}</value> >> >> <description>Temporary directory on the local >> >> filesystem.</description> >> >> </property> >> >> <property> >> >> <name>hbase.zookeeper.quorum</name> >> >> <value>127.0.0.1</value> >> >> <description>The directory shared by region servers. >> >> </description> >> >> </property> >> >> </configuration> >> >> >> > Hadoop and HBase are running under *hbase *user, all necessary >> directories >> > are owned by *hbase *user (I mean */hadoop* directory and all its >> > subdirectories). >> > >> > First launch was successfull, but after several days of work we trapt in >> > problem that hbase master was down, then we tried to restart it (* >> > stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error: >> > >> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient: >> datastreamer >> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception: >> file >> >> /hbase.version could only be replicated to 0 nodes, instead of 1 >> at >> >> >> org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267) >> >> at >> >> >> org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422) >> >> >> > >> > Then I tried to reformat hdfs (then, also remove all hadoop and hbase >> data, >> > then format hdfs again) and start hadoop and hbase again, but HBase >> master >> > fails to start with the same error. >> > >> > Could someone revise our configuration and tell us what is the reason for >> > such HBase master instance behaviour? >> > >> > Thanks in advance, Artyom >> > ------------------------------------------------- >> > Best wishes, Artyom Shvedchikov >> > >> >
