Hi Jason, Thank you very much for your reply. The namenode is running and is out of safe mode. Also there are sufficient datanodes running. I even reduced the replication level to 1, while I have 3 datanodes running. Actually, I am seeing the following in the log files:
in the namenode log file: 2009-12-20 21:37:22,620 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /${hadoop.tmp.dir}/mapred/system/jobtracker.info. blk_7768839494616930267_1004 in the jobtracker log file: 2009-12-20 21:37:22,591 WARN org.apache.hadoop.mapred.JobTracker: Retrying... 2009-12-20 21:38:25,633 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/192.168.130.108:50010] 2009-12-20 21:38:25,633 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7768839494616930267_1004 2009-12-20 21:38:25,637 INFO org.apache.hadoop.hdfs.DFSClient: Waiting to find target node: 192.168.130.108:50010 I understand from the message in the namenode log file that this block is created but still in the jobtracker log file it seems to be failed. Any suggestions? Could it be a network problem? This is not my first time to use hadoop but it is my first time to see these erorrs. The cluster I am currently setting up to use hadoop is configured to use NAT for connection between the head node and other nodes. I am not working on the head node so this should not affect me. I made sure that all the nodes are able to ssh to each other (not only to ssh to the master node). Actually the home directory is mounted on a shared storage on all machines, so adding any node to known hosts allows all other nodes to ssh to it. Thanks Iman. ________________________________ From: Jason Venner <jason.had...@gmail.com> To: hdfs-user@hadoop.apache.org Sent: Sat, December 19, 2009 3:46:14 PM Subject: Re: hdfs error when starting the jobtracker A common set of reasons for the jobtracker not starting are: 1) namenode not running 2) namenode not out of safe mode 2.1) no / insufficient datanodes running On Thu, Dec 17, 2009 at 7:36 PM, Iman E <hadoop_...@yahoo.com> wrote: Hi, >I do have this basic question about hadoop configuration. Whenever I try to >start the jobtracker it will remain in "initializing" mode forever, and when I >checked the log file, I found the following errors: > >several lines like these for different slaves in my cluster: > >2009-12-17 17:47:43,717 INFO org.apache.hadoop.hdfs.DFSClient: Exception in >createBlockOutputStream java.net.SocketTimeoutException: 66000 millis timeout >while waiting for channel to be ready for connect. ch : >java.nio.channels.SocketChannel[connection-pending >remote=/XXX.XXX.XXX.XXX:50010] >2009-12-17 17:47:43,717 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning >block blk_7740448897934265604_1010 >2009-12-17 17:47:43,720 INFO org.apache.hadoop.hdfs.DFSClient: Waiting to find >target node: XXX.XXX.XXX.XXX:50010 > >then > >2009-12-17 17:47:49,727 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer >Exception: java.io.IOException: Unable to create new block. > at >org.apache.hadoop..hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812) > at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076) > at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262) >2009-12-17 17:47:49,728 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery >for block blk_7740448897934265604_1010 bad datanode[0] nodes == null >2009-12-17 17:47:49,728 WARN org.apache.hadoop.hdfs.DFSClient: Could not get >block locations. Source file >"${mapred.system.dir}/mapred/system/jobtracker.info" - Aborting... >2009-12-17 17:47:49,728 WARN org.apache.hadoop.mapred.JobTracker: Writing to >file ${fs.default.name}/${mapred..system.dir}/mapred/system/jobtracker.info >failed! >2009-12-17 17:47:49,728 WARN org.apache.hadoop.mapred.JobTracker: FileSystem >is not ready yet! >2009-12-17 17:47:49,749 WARN org.apache.hadoop.mapred.JobTracker: Failed to >initialize recovery manager. >java.net.SocketTimeoutException: 66000 millis timeout while waiting for >channel to be ready for connect. ch : >java.nio.channels..SocketChannel[connection-pending >remote=/XXX.XXX.XXX.XXX:50010] > at >org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) > at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2837) > at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793) > at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076) > at >org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262) >2009-12-17 17:47:59,757 WARN org.apache.hadoop.mapred.JobTracker: Retrying... > > >then it will start all over again. > >I am not sure what is the reason for this error.. I tried to set >mapred.system.dir to leave it to the default value, and overwriting it in >mapred-site.xml to both local and shared directories but no use. In all cases >the this error will show in the log file: Writing to file >${fs.default.name}/${mapred.system.dir}/mapred/system/jobtracker.info failed! >Is it true that hadoop append these values together? What should I do to avoid >this? Does anyone know what I am doing wrong or what could be causing these >errors? > >Thanks > > -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals