Re: datanode auto down

jason hadoop Fri, 17 Jul 2009 06:19:42 -0700

Most likely problem is that the block report is taking more than 10 minutes.
Due to the placement of the sync blocks in the core Datanode code, the block
report locks out the heartbeat.
This can cause the namenode to think the datanode has vanished.


A simple way to check, is to run a find on the directory set specified for
dfs.data.dir. If this find takes more than 8 minutes or so, you are in
trouble.
The only solutions are to add more datanodes, reducing the block count, or
increase your system io speed so that the block report may complete in time.

On Fri, Jul 17, 2009 at 6:12 AM, mingyang <[email protected]> wrote:

> i using hadoop storage my media files，, but when the number of documents
> when more than one million,
> Hadoop start about 10-20 minutes, my datanode automatically down,
> namenode log shows that the loss of heart, but I see my normal datanode,
> port 50010 can be a normal telnet, use jps to see can see datanode still
> running, but at this time have been unable to put data to a hadoop, I guess
> datanode services is dead, hadoop does not support more than one million
> documents? How do I adjust those parameters? I have already set up at the
> same time the number of open file 65535
>
>
> namenode log
>
> 2009-07-17 18:14:29,330 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
>
> ugi=root,root,bin,daemon,sys,adm,disk,wheelip=/192.168.1.96
> cmd=setPermission
>
> src=/hadoop/tmp/mapred/system/jobtracker.info   dst=null
> perm=root:supergroup
> :rw-------
> 2009-07-17 18:14:29,336 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
>
> /hadoop/tmp/mapred/system/jobtrack
> er.info. blk_-2148480138731090754_1403179
> 2009-07-17 18:14:32,958 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated:
>
> 192.168.1.97:50
> 010 is added to blk_-2148480138731090754_1403179 size 4
> 2009-07-17 18:14:33,340 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
>
> /hadoop/tmp/mapred/system/jobtra
> cker.info is closed by DFSClient_1037557306
> 2009-07-17 18:16:21,349 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
> 192.168.1.96
> 2009-07-17 18:16:21,349 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> transactions:
> 7 Total time for
>
> transacti
> ons(ms): 1Number of transactions batched in Syncs: 1 Number of syncs: 6
> SyncTimes(ms): 9
> 2009-07-17 18:17:12,171 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage from
> 192.168.1.96
> 2009-07-17 18:17:12,171 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> transactions:
> 0 Total time for
>
> transacti
> ons(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 1
> SyncTimes(ms): 0
> 2009-07-17 18:51:00,566 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.heartbeatCheck: lost heartbeat from
>
> 192.168.1.97:
> 50010
> 2009-07-17 18:51:25,383 INFO org.apache.hadoop.net.NetworkTopology:
> Removing
> a node: /default-rack/192.168.1.97:50010
> 2009-07-17 19:10:48,564 INFO
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.  Holder:
> DFSClient_-
>
> 1624377199, pend
> ingcreates: 69] has expired hard limit
> 2009-07-17 19:10:48,564 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
> lease=[Lease.  Holder:
>
> DFSClient_-16243
> 77199, pendingcreates: 69],
> src=/unp/01/video/B3/94/{B394EDB2-0302-34B9-5357-4904FFFEFF36}_100.unp
>
> datanode log
>
> 2009-07-17 18:52:40,719 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_664105388033641
> 1514_601647
> 2009-07-17 18:52:12,421 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_-10747966535898
> 19594_1392025
> 2009-07-17 18:51:44,074 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_-63504301593802
> 31402_155334
> 2009-07-17 18:51:12,760 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_460729998775184
> 5359_395290
> 2009-07-17 18:50:39,977 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_802918354954113
> 9011_474989
> 2009-07-17 18:50:11,707 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_846865664811904
> 9754_1065465
> 2009-07-17 18:49:39,421 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_473953565994615
> 8302_532204
> 2009-07-17 18:49:11,213 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_-14950858387931
> 09024_354553
>
> 09/07/17 18:02:09 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor
> exception  for block
> blk_-2536746364442878375_1403164java.net.SocketTimeoutException: 63000
> millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.94:54783remote=/
> 192.168.1.97:50010]
> 11096473        09/07/17 18:02:12 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.net.SocketTimeoutException: 63000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.94:54790remote=/
> 192.168.1.97:50010]
> 11096475        09/07/17 18:02:12 INFO hdfs.DFSClient: Exception in
> createBlockOutputStream java.net.SocketTimeoutException: 63000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.94:54791remote=/
> 192.168.1.97:50010]
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: datanode auto down

Reply via email to