Adam,

You've run into a fairly common issue with the 0.20.x release, that of
/dev/random being used by the DataNode daemon upon startup, which
blocks the input calls until it has enough data to give back. Usually,
if your DN machines have some other activity (mouse, keyboard on
terminal, etc.), the DNs would get unwedged and start after a while.
But this could take a lot of time on a remote single-purposed node
such as yours (as there's no activity).

As a workaround, you could add this to the JVM args, and it would work. The
'/dev/../dev/urandom' is required as if you point it at /dev/urandom,
java ignores your setting. (Hat tip: Brock Noland)

-Djava.security.egd=file:/dev/../dev/urandom

(Strace that confirms this):
"main" prio=10 tid=0x0000000001dda800 nid=0xcda runnable [0x0000000041257000]
  java.lang.Thread.State: RUNNABLE
       at java.io.FileInputStream.readBytes(Native Method)
       at java.io.FileInputStream.read(FileInputStream.java:236)
       at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
       - locked <0x00000000ed1f2958> (a java.io.BufferedInputStream)
       at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
       at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
       - locked <0x00000000ed1f27b8> (a java.io.BufferedInputStream)
       at 
sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469)
       at 
sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140)
       at 
sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135)
       at 
sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131)
       at 
sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
       - locked <0x00000000ed1f23d8> (a sun.security.provider.SecureRandom)
       at java.security.SecureRandom.nextBytes(SecureRandom.java:450)
       - locked <0x00000000ed1f2678> (a java.security.SecureRandom)
       at java.security.SecureRandom.next(SecureRandom.java:472)
       at java.util.Random.nextInt(Random.java:272)

For some further discussions on this, see/follow:
http://search-hadoop.com/m/pgnuO1wbQRQ

Btw, this has been fixed in trunk @
https://issues.apache.org/jira/browse/HDFS-1835

On Wed, Aug 17, 2011 at 1:18 AM, Adam DeConinck
<[email protected]> wrote:
> Hi all,
>
> I've been seeing an HDFS issue I don't understand, and I'm hoping
> someone else has seen this before.
>
> I'm currently attempting to set up a simple-stupid Hadoop 0.20.203.0
> test cluster on two Dell PE1950s running a minimal installation of RHEL
> 5.6.  The master node, wd0031, is running a NameNode, DataNode and
> SecondaryNameNode.  A single slave node, wd0032, is running a DataNode.
> The Hadoop processes are starting up fine and I'm not seeing any errors
> in the log files; but the DataNodes never join the filesystem.  There
> are never any log entries in the NameNode about their registration,
> doing a "hadoop fsck /" lists zero data-nodes, and I can't write files.
> The config and log files, and some ngrep traces, are up on
> https://gist.github.com/1149869 .
>
> What's weird is that exactly the same configuration works on a two-node
> EC2 cluster running CentOS 5.6: the filesystem works, fsck lists the
> datanodes, and the logs show the right entries.  See
> https://gist.github.com/1149823 .  As far as I can tell there should be
> no difference between these cases.
>
> Jstack traces on a DataNode and NameNode for both cases, local and EC2,
> are here: https://gist.github.com/1149843
>
> I'm a relative newbie to Hadoop, and I cannot figure out why I'm having
> this problem on local hardware but not EC2.  There's nothing in the logs
> or the jstacks which is obvious to me, but hopefully someone who knows
> Hadoop better can let me know.
>
> Please feel free to let me know if you need more information.
>
> Thanks,
> Adam
>
> --
> Adam DeConinck | Applications Specialist | [email protected]
>
> Enabling Innovation Through Fast and Flexible HPC Resources
>
> R Systems NA, inc. | 1902 Fox Drive, Champaign, IL 61820 | 217.954.1056 | 
> www.rsystemsinc.com
>
>
>



-- 
Harsh J

Reply via email to