Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

Tatsuya Kawano Tue, 27 Oct 2009 18:43:09 -0700

Hi Artyom,

>> I'm not totally sure, but I think this exception occurs when there is
>> no HDFS data node available in the cluster.
>>
>> Can you access to the HDFS name node status screen at
>> <http://servers-ip:50070/> from a web browser to see if there is a
>> data node available?


> Yes, the HDFS name node status is accessible and data node is available
> through a web browser using url <http://servers-ip:50070/>.
>
> Could you provide some examples when data node does not available in the
> cluster and for the HBase master?


I happen to have an Ubuntu 9.04 virtual server installation, so I set
up HDFS on it to see if I can reproduce the exception you had. And I
found I can easily reproduce this by the following steps:

1. Delete hadoop data directory
2. bin/hadoop namenode -format
3. bin/start-all.sh
    -> namenode will start immediately and go in service, but data
node will be making a long (almost seven minutes) pause in a middle of
the startup.

4. Before the data node becomes ready, do an HDFS write operation
(e.g. "bin/hadoop fs -put conf input"), and then the write operations
will fail with the following error:
------------------------------------------------
09/10/28 09:00:19 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/tatsuya/input/capacity-scheduler.xml could only be replicated to
0 nodes, instead of 1
...

09/10/28 09:00:19 WARN hdfs.DFSClient: Error Recovery for block null
bad datanode[0] nodes == null
09/10/28 09:00:19 WARN hdfs.DFSClient: Could not get block locations.
Source file "/user/tatsuya/input/capacity-scheduler.xml" - Aborting...
------------------------------------------------

This doesn't seem to be a desired behavior of HDFS; shouldn't HDFS be
in the safe mode while data node is not ready?


Also, if I skip step #1 and 2, the problem doesn't happen. The data
node still does the long pause at startup, but HDFS cluster will start
in the safe mode and wait for the data node to become ready. HBase
deals with HDFS safe mode, so HBase should work fine in this case.

Can you check if this is your case? If so, you can avoid this by not
running "start-hbase.sh" until HDFS has the data nodes available.



I have done a little more investigation why the data node makes the
long pause on Ubuntu 9.04. It seems there is a problem with SUN JRE
SecureRandom implementation on Linux, and this causes Jetty (used in
the data node) to slow down to create its session ID manager.


Here is the data node log, with a seven-minute pause while it's trying
to start Jetty.
------------------------------------------------
2009-10-28 09:00:10,559 INFO org.mortbay.log: jetty-6.1.14
2009-10-28 09:06:54,165 INFO org.mortbay.log: Started
[email protected]:50075
------------------------------------------------


Here is a part of a full thread dump;
sun.security.provider.SecureRandom is taking long time (forever?) to
finish.
------------------------------------------------
"main" prio=10 tid=0x00000000409a8800 nid=0xba2 runnable [0x00007ff762a32000]
   java.lang.Thread.State: RUNNABLE
     at java.io.FileInputStream.readBytes(Native Method)
     ...
     - locked <0x00007ff749edfbb8> (a java.io.BufferedInputStream)

     at 
sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:453)
     at sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:123)
     at sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:118)
     at 
sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:114)
     at 
sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:171)
     - locked <0x00007ff749edf388> (a sun.security.provider.SecureRandom)
     at java.security.SecureRandom.nextBytes(SecureRandom.java:433)
     - locked <0x00007ff749edf6b8> (a java.security.SecureRandom)
     at java.security.SecureRandom.next(SecureRandom.java:455)
     at java.util.Random.nextLong(Random.java:284)
     at 
org.mortbay.jetty.servlet.HashSessionIdManager.doStart(HashSessionIdManager.java:139)
     ...

     at org.apache.hadoop.http.HttpServer.start(HttpServer.java:460)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:375)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
     at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
     at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
------------------------------------------------


And I found this is a known issue on Jetty:
http://jira.codehaus.org/browse/JETTY-331

It says you could workaround by changing Jetty setting to use
"java.util.Random" instead of "sun.security.provider.SecureRandom". I
don't know if this is a correct way to workaround. I'd better ask HDFS
folks at hdfs-user mailing list for a solution. (I'm currently not a
member of the mailing list.)


Hope this helps,

-- 
Tatsuya Kawano (Mr.)
Tokyo, Japan



On Wed, Oct 28, 2009 at 7:12 AM, Artyom Shvedchikov <[email protected]> wrote:
> Hello, Tatsuya
> Thank you for the fast assistance.
>
> I'm not totally sure, but I think this exception occurs when there is
>> no HDFS data node available in the cluster.
>>
>> Can you access to the HDFS name node status screen at
>> <http://servers-ip:50070/> from a web browser to see if there is a
>> data node available?
>>
>
> Yes, the HDFS name node status is accessible and data node is available
> through a web browser using url <http://servers-ip:50070/>.
>
> Could you provide some examples when data node does not available in the
> cluster and for the HBase master?
> -------------------------------------------------
> Best wishes, Artyom Shvedchikov
>
>
> On Tue, Oct 27, 2009 at 10:01 AM, Tatsuya Kawano
> <[email protected]>wrote:
>
>> Hi Artyom,
>>
>> Your configuration files look just fine.
>>
>>
>> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient:
>> datastreamer
>> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception:
>> file
>> >> /hbase.version could only be replicated to 0 nodes, instead of 1
>>
>> I'm not totally sure, but I think this exception occurs when there is
>> no HDFS data node available in the cluster.
>>
>> Can you access to the HDFS name node status screen at
>> <http://servers-ip:50070/> from a web browser to see if there is a
>> data node available?
>>
>> Thanks,
>>
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan
>>
>>
>> On Tue, Oct 27, 2009 at 11:24 AM, Artyom Shvedchikov <[email protected]>
>> wrote:
>> > Hello.
>> >
>> > We are testing the latest HBase 0.20.1 in pseudo-distributed mode with
>> > Hadoop 0.20.1 on such environment:
>> > *h/w*: Intel C2D 1.86 GHz, RAM 2 Gb 667 MHz, HDD 1TB Seagate SATA2 7200
>> Rpm
>> > *s/w*: Ubuntu 9.04, Filesystem type is *ext3*, Java  1.6.0_16-b01, Hadoop
>> > 0.20.1, HBase 0.20.1
>> >
>> > File */etc/hosts*
>> >
>> >> 127.0.0.1       localhost
>> >>
>> >> # The following lines are desirable for IPv6 capable hosts
>> >> ::1     localhost ip6-localhost ip6-loopback
>> >> fe00::0 ip6-localnet
>> >> ff00::0 ip6-mcastprefix
>> >> ff02::1 ip6-allnodes
>> >> ff02::2 ip6-allrouters
>> >> ff02::3 ip6-allhosts
>> >>
>> > Hadoop and HBase are running in pseudo-distributed mode:
>> > Two options added to *hadoop-env.sh*:
>> >
>> >> export JAVA_HOME=/usr/lib/jvm/java-6-sun
>> >> export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
>> >>
>> > *core-site.xml*:
>> >
>> >> <configuration>
>> >> <property>
>> >>   <name>fs.default.name</name>
>> >>   <value>hdfs://127.0.0.1:9000</value>
>> >> </property>
>> >> <property>
>> >>   <name>hadoop.tmp.dir</name>
>> >>   <value>/hadoop/tmp/hadoop-${user.name}</value>
>> >>   <description>A base for other temporary directories.</description>
>> >> </property>
>> >> </configuration>
>> >>
>> > *hdfs-site.xml*:
>> >
>> >> <configuration>
>> >>   <property>
>> >>     <name>dfs.replication</name>
>> >>     <value>1</value>
>> >>   </property>
>> >> <property>
>> >>   <name>dfs.name.dir</name>
>> >>   <value>/hadoop/hdfs/name</value>
>> >> </property>
>> >> <property>
>> >>   <name>dfs.data.dir</name>
>> >>   <value>/hadoop/hdfs/data</value>
>> >> </property>
>> >> <property>
>> >>   <name>dfs.datanode.socket.write.timeout</name>
>> >>   <value>0</value>
>> >> </property>
>> >> <property>
>> >>    <name>dfs.datanode.max.xcievers</name>
>> >>    <value>1023</value>
>> >> </property>
>> >> </configuration>
>> >>
>> > *marped-site.xml:*
>> >
>> >> <configuration>
>> >> <property>
>> >>   <name>mapred.job.tracker</name>
>> >>   <value>127.0.0.1:9001</value>
>> >> </property>
>> >> </configuration>
>> >>
>> > *hbase-site.xml:*
>> >
>> >> <configuration>
>> >>   <property>
>> >>     <name>hbase.rootdir</name>
>> >>     <value>hdfs://localhost:9000/</value>
>> >>     <description>The directory shared by region servers.
>> >>     Should be fully-qualified to include the filesystem to use.
>> >>     E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
>> >>     </description>
>> >>   </property>
>> >>   <property>
>> >>     <name>hbase.master</name>
>> >>     <value>127.0.0.1:60000</value>
>> >>     <description>The host and port that the HBase master runs at.
>> >>     </description>
>> >>   </property>
>> >>   <property>
>> >>      <name>hbase.tmp.dir</name>
>> >>      <value>/hadoop/tmp/hbase-${user.name}</value>
>> >>      <description>Temporary directory on the local
>> >> filesystem.</description>
>> >>   </property>
>> >>     <property>
>> >>         <name>hbase.zookeeper.quorum</name>
>> >>         <value>127.0.0.1</value>
>> >>         <description>The directory shared by region servers.
>> >>         </description>
>> >>     </property>
>> >> </configuration>
>> >>
>> >  Hadoop and HBase are running under *hbase *user, all necessary
>> directories
>> > are owned by *hbase *user (I mean */hadoop* directory and all its
>> > subdirectories).
>> >
>> > First launch was successfull, but after several days of work we trapt in
>> > problem that hbase master was down, then we tried to restart it (*
>> > stop-hbase.sh*, then *start-hbase.sh*) - restart fails with error:
>> >
>> >> 2009-10-26 13:34:30,031 WARN org.apache.hadoop.hdfs.dfsclient:
>> datastreamer
>> >> exception: org.apache.hadoop.ipc.remoteexcep tion: java.io.ioexception:
>> file
>> >> /hbase.version could only be replicated to 0 nodes, instead of 1
>> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.fsnamesystem.getadditionalblock(fsnamesystem.java:1267)
>> >> at
>> >>
>> org.apache.hadoop.hdfs.server.namenode.namenode.addblock(namenode.java:422)
>> >>
>> >
>> > Then I tried to reformat hdfs (then, also remove all hadoop and hbase
>> data,
>> > then format hdfs again) and start hadoop and hbase again, but HBase
>> master
>> > fails to start with the same error.
>> >
>> > Could someone revise our configuration and tell us what is the reason for
>> > such HBase master instance behaviour?
>> >
>> > Thanks in advance, Artyom
>> > -------------------------------------------------
>> > Best wishes, Artyom Shvedchikov
>> >
>>
>

Re: HBase 0.20.1 on Ubuntu 9.04: master fails to start

Reply via email to