Hi all, I've been seeing an HDFS issue I don't understand, and I'm hoping someone else has seen this before.
I'm currently attempting to set up a simple-stupid Hadoop 0.20.203.0 test cluster on two Dell PE1950s running a minimal installation of RHEL 5.6. The master node, wd0031, is running a NameNode, DataNode and SecondaryNameNode. A single slave node, wd0032, is running a DataNode. The Hadoop processes are starting up fine and I'm not seeing any errors in the log files; but the DataNodes never join the filesystem. There are never any log entries in the NameNode about their registration, doing a "hadoop fsck /" lists zero data-nodes, and I can't write files. The config and log files, and some ngrep traces, are up on https://gist.github.com/1149869 . What's weird is that exactly the same configuration works on a two-node EC2 cluster running CentOS 5.6: the filesystem works, fsck lists the datanodes, and the logs show the right entries. See https://gist.github.com/1149823 . As far as I can tell there should be no difference between these cases. Jstack traces on a DataNode and NameNode for both cases, local and EC2, are here: https://gist.github.com/1149843 I'm a relative newbie to Hadoop, and I cannot figure out why I'm having this problem on local hardware but not EC2. There's nothing in the logs or the jstacks which is obvious to me, but hopefully someone who knows Hadoop better can let me know. Please feel free to let me know if you need more information. Thanks, Adam -- Adam DeConinck | Applications Specialist | [email protected] Enabling Innovation Through Fast and Flexible HPC Resources R Systems NA, inc. | 1902 Fox Drive, Champaign, IL 61820 | 217.954.1056 | www.rsystemsinc.com
