Re: hdfs error when starting the jobtracker

Iman E Sun, 20 Dec 2009 19:02:35 -0800

Hi Jason,
Thank you very much for your reply.
The namenode is running and is out of safe mode. Also there are sufficient 
datanodes running. I even reduced the replication level to 1, while I have 3 
datanodes running. 
Actually, I am seeing the following in the log files:


in the namenode log file:
2009-12-20 21:37:22,620 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /${hadoop.tmp.dir}/mapred/system/jobtracker.info. 
blk_7768839494616930267_1004

in the jobtracker log file:
2009-12-20 21:37:22,591 WARN org.apache.hadoop.mapred.JobTracker: Retrying...
2009-12-20 21:38:25,633 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
createBlockOutputStream java.net.SocketTimeoutException: 63000 millis timeout 
while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=/192.168.130.108:50010]
2009-12-20 21:38:25,633 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block 
blk_7768839494616930267_1004
2009-12-20 21:38:25,637 INFO org.apache.hadoop.hdfs.DFSClient: Waiting to find 
target node: 192.168.130.108:50010

I understand from the message in the namenode log file that this block is 
created but still in the jobtracker log file it seems to be failed. 
Any suggestions?
Could it be a network problem?  This is not my first time to use hadoop but it 
is my first time to see these erorrs. The cluster I am currently setting up to 
use hadoop is configured to use NAT for connection  between the head node and 
other nodes. I am not working on the head node so this should not affect me. I 
made sure that all the nodes are able to ssh to each other (not only to ssh to 
the master node). Actually the home directory is mounted on a shared storage on 
all machines, so adding any node to known hosts allows all other nodes to ssh 
to it.

Thanks
Iman.


________________________________
From: Jason Venner <jason.had...@gmail.com>
To: hdfs-user@hadoop.apache.org
Sent: Sat, December 19, 2009 3:46:14 PM
Subject: Re: hdfs error when starting the jobtracker

A common set of reasons for the jobtracker not starting are:
1) namenode not running
2) namenode not out of safe mode
2.1) no / insufficient datanodes running


On Thu, Dec 17, 2009 at 7:36 PM, Iman E <hadoop_...@yahoo.com> wrote:

Hi,
>I do have this basic question about hadoop configuration. Whenever I try to 
>start the jobtracker it will remain in "initializing" mode forever, and when I 
>checked the log file, I found the following errors:
>
>several lines like these for different slaves in my cluster:
>
>2009-12-17 17:47:43,717 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
>createBlockOutputStream java.net.SocketTimeoutException: 66000 millis timeout 
>while waiting for channel to be ready for connect. ch : 
>java.nio.channels.SocketChannel[connection-pending 
>remote=/XXX.XXX.XXX.XXX:50010]
>2009-12-17 17:47:43,717 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
>block blk_7740448897934265604_1010
>2009-12-17 17:47:43,720 INFO org.apache.hadoop.hdfs.DFSClient: Waiting to find 
>target node: XXX.XXX.XXX.XXX:50010
>
>then 
>
>2009-12-17 17:47:49,727 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
>Exception: java.io.IOException: Unable to create new block.
>        at 
>org.apache.hadoop..hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>        at 
>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at 
>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>2009-12-17 17:47:49,728 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
>for block blk_7740448897934265604_1010 bad datanode[0] nodes == null
>2009-12-17 17:47:49,728 WARN org.apache.hadoop.hdfs.DFSClient: Could not get 
>block locations. Source file 
>"${mapred.system.dir}/mapred/system/jobtracker.info" - Aborting...
>2009-12-17 17:47:49,728 WARN org.apache.hadoop.mapred.JobTracker: Writing to 
>file ${fs.default.name}/${mapred..system.dir}/mapred/system/jobtracker.info 
>failed!
>2009-12-17 17:47:49,728 WARN org.apache.hadoop.mapred.JobTracker: FileSystem 
>is not ready yet!
>2009-12-17 17:47:49,749 WARN org.apache.hadoop.mapred.JobTracker: Failed to 
>initialize recovery manager. 
>java.net.SocketTimeoutException: 66000 millis timeout while waiting for 
>channel to be ready for connect. ch : 
>java.nio.channels..SocketChannel[connection-pending 
>remote=/XXX.XXX.XXX.XXX:50010]
>        at 
>org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>        at 
>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2837)
>        at 
>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>        at 
>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at 
>org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>2009-12-17 17:47:59,757 WARN org.apache.hadoop.mapred.JobTracker: Retrying...
>
>
>then it will start all over again. 
>
>I am not sure what is the reason for this error.. I tried to set 
>mapred.system.dir to leave it to the default value, and overwriting it in 
>mapred-site.xml to both local and shared directories but no use. In all cases 
>the this error will show in the log file: Writing to file 
>${fs.default.name}/${mapred.system.dir}/mapred/system/jobtracker.info failed!
>Is it true that hadoop append these values together? What should I do to avoid 
>this? Does anyone know what I am doing wrong or what could be causing these 
>errors?
>
>Thanks
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: hdfs error when starting the jobtracker

Reply via email to