Re: datanode auto down

mingyang Sun, 19 Jul 2009 03:53:49 -0700

But I added datanode still appear in this phenomenon, and I added a new
datanode, the above data only 10g
Strangely enough, wait about 10 minutes, hadoop will resume, I do not know
what the problem is this, ask for help



it‘s log
09/07/19 18:46:13 INFO hdfs.DFSClient: Waiting to find target node:
192.168.1.97:50010
09/07/19 18:46:13 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.SocketTimeoutException: 66000 millis timeout wh
ile waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.94:16153remote=/192.1
68.1.97:50010]
09/07/19 18:46:08 INFO hdfs.DFSClient: Waiting to find target node:
192.168.1.97:50010
09/07/19 18:46:08 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.SocketTimeoutException: 66000 millis timeout wh
ile waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.94:16152remote=/192.1
68.1.97:50010]
09/07/19 18:45:01 INFO hdfs.DFSClient: Waiting to find target node:
192.168.1.105:50010
09/07/19 18:45:01 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 1
92.168.1.97:50010
09/07/19 18:44:56 INFO hdfs.DFSClient: Waiting to find target node:
192.168.1.97:50010
09/07/19 18:44:56 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.SocketTimeoutException: 66000 millis timeout wh
ile waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.94:16148remote=/192.1
68.1.97:50010]





2009/7/17 mingyang <[email protected]>

> I run this, the result is far less than eight minutes, is not where I set
> up right? I just changed the fs.default.name, other settings are default
> values
>    <name> fs.default.name </ name>
>    <value> hdfs: / / named: 9000 </ value>
>
>
>
>    <name> mapred.job.tracker </ name>
>    <value> hdfs: / / named: 9001 </ value>
>
>
>
>    <name> hadoop.tmp.dir </ name>
>    <value> / hadoop / tmp </ value>
>
>    <name> dfs.name.dir </ name>
>    <value> / hadoop / name </ value>
>
>
>    <name> dfs.data.dir </ name>
>    <value> / b, / c, / d, / e, / f, / g, / h, / i, / j, / k, / l </ value>
>
>
>
>    <name> dfs.replication </ name>
>    <value> 1 </ value>
>
>
>    <name> dfs.block.size </ name>
>    <value> 2560000 </ value>
>
>
>
> time find / b / c / d / e / f / g / h / i / j / k / l-type f-print> / dev /
> null
>
> real 0m4.608s
> user 0m1.489s
> sys 0m3.068s
>
>
> 2009/7/17 jason hadoop <[email protected]>
>
>> From the shell,
>> time find /b /c /d /e /f /g /h /i /j /k /l -type f -print > /dev/null
>>
>> Also, unless your distribution is magic, white space is not ignored in
>> value
>> statements. Hopefully your actual value is
>> <value>/b,/c,/d,/e,/f,/g,/h,/i,/j,/k,/l</value>
>>
>> On Fri, Jul 17, 2009 at 6:29 AM, mingyang <[email protected]> wrote:
>>
>> > Thank you for your reply, I am novice, I do not quite understand how to
>> > check dfs.data.dir more than eight minutes?
>> > My settings are dfs.data.dir
>> > <property>
>> >   <name> dfs.data.dir </ name>
>> >   <value> / b, / c, / d, / e, / f, / g, / h, / i, / j, / k, / l </
>> value>
>> >   </ property>
>> >
>> > Each directory has a disk mount
>> >
>> > 2009/7/17 jason hadoop <[email protected]>
>> >
>> > > Most likely problem is that the block report is taking more than 10
>> > > minutes.
>> > > Due to the placement of the sync blocks in the core Datanode code, the
>> > > block
>> > > report locks out the heartbeat.
>> > > This can cause the namenode to think the datanode has vanished.
>> > >
>> > > A simple way to check, is to run a find on the directory set specified
>> > for
>> > > dfs.data.dir. If this find takes more than 8 minutes or so, you are in
>> > > trouble.
>> > > The only solutions are to add more datanodes, reducing the block
>> count,
>> > or
>> > > increase your system io speed so that the block report may complete in
>> > > time.
>> > >
>> > > On Fri, Jul 17, 2009 at 6:12 AM, mingyang <[email protected]>
>> wrote:
>> > >
>> > > > i using hadoop storage my media files，, but when the number of
>> > documents
>> > > > when more than one million,
>> > > > Hadoop start about 10-20 minutes, my datanode automatically down,
>> > > > namenode log shows that the loss of heart, but I see my normal
>> > datanode,
>> > > > port 50010 can be a normal telnet, use jps to see can see datanode
>> > still
>> > > > running, but at this time have been unable to put data to a hadoop,
>> I
>> > > guess
>> > > > datanode services is dead, hadoop does not support more than one
>> > million
>> > > > documents? How do I adjust those parameters? I have already set up
>> at
>> > the
>> > > > same time the number of open file 65535
>> > > >
>> > > >
>> > > > namenode log
>> > > >
>> > > > 2009-07-17 18:14:29,330 INFO
>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
>> > > >
>> > > > ugi=root,root,bin,daemon,sys,adm,disk,wheelip=/192.168.1.96
>> > > > cmd=setPermission
>> > > >
>> > > > src=/hadoop/tmp/mapred/system/jobtracker.info   dst=null
>> > > > perm=root:supergroup
>> > > > :rw-------
>> > > > 2009-07-17 18:14:29,336 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK*
>> > > > NameSystem.allocateBlock:
>> > > >
>> > > > /hadoop/tmp/mapred/system/jobtrack
>> > > > er.info. blk_-2148480138731090754_1403179
>> > > > 2009-07-17 18:14:32,958 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK*
>> > > > NameSystem.addStoredBlock: blockMap updated:
>> > > >
>> > > > 192.168.1.97:50
>> > > > 010 is added to blk_-2148480138731090754_1403179 size 4
>> > > > 2009-07-17 18:14:33,340 INFO org.apache.hadoop.hdfs.StateChange:
>> DIR*
>> > > > NameSystem.completeFile: file
>> > > >
>> > > > /hadoop/tmp/mapred/system/jobtra
>> > > > cker.info is closed by DFSClient_1037557306
>> > > > 2009-07-17 18:16:21,349 INFO
>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
>> from
>> > > > 192.168.1.96
>> > > > 2009-07-17 18:16:21,349 INFO
>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> > > > transactions:
>> > > > 7 Total time for
>> > > >
>> > > > transacti
>> > > > ons(ms): 1Number of transactions batched in Syncs: 1 Number of
>> syncs: 6
>> > > > SyncTimes(ms): 9
>> > > > 2009-07-17 18:17:12,171 INFO
>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage
>> from
>> > > > 192.168.1.96
>> > > > 2009-07-17 18:17:12,171 INFO
>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> > > > transactions:
>> > > > 0 Total time for
>> > > >
>> > > > transacti
>> > > > ons(ms): 0Number of transactions batched in Syncs: 0 Number of
>> syncs: 1
>> > > > SyncTimes(ms): 0
>> > > > 2009-07-17 18:51:00,566 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK*
>> > > > NameSystem.heartbeatCheck: lost heartbeat from
>> > > >
>> > > > 192.168.1.97:
>> > > > 50010
>> > > > 2009-07-17 18:51:25,383 INFO org.apache.hadoop.net.NetworkTopology:
>> > > > Removing
>> > > > a node: /default-rack/192.168.1.97:50010
>> > > > 2009-07-17 19:10:48,564 INFO
>> > > > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.
>> > >  Holder:
>> > > > DFSClient_-
>> > > >
>> > > > 1624377199, pend
>> > > > ingcreates: 69] has expired hard limit
>> > > > 2009-07-17 19:10:48,564 INFO
>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
>> > > > lease=[Lease.  Holder:
>> > > >
>> > > > DFSClient_-16243
>> > > > 77199, pendingcreates: 69],
>> > > >
>> src=/unp/01/video/B3/94/{B394EDB2-0302-34B9-5357-4904FFFEFF36}_100.unp
>> > > >
>> > > > datanode log
>> > > >
>> > > > 2009-07-17 18:52:40,719 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> Verification
>> > > > succeeded for blk_664105388033641
>> > > > 1514_601647
>> > > > 2009-07-17 18:52:12,421 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> Verification
>> > > > succeeded for blk_-10747966535898
>> > > > 19594_1392025
>> > > > 2009-07-17 18:51:44,074 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> Verification
>> > > > succeeded for blk_-63504301593802
>> > > > 31402_155334
>> > > > 2009-07-17 18:51:12,760 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> Verification
>> > > > succeeded for blk_460729998775184
>> > > > 5359_395290
>> > > > 2009-07-17 18:50:39,977 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> Verification
>> > > > succeeded for blk_802918354954113
>> > > > 9011_474989
>> > > > 2009-07-17 18:50:11,707 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> Verification
>> > > > succeeded for blk_846865664811904
>> > > > 9754_1065465
>> > > > 2009-07-17 18:49:39,421 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> Verification
>> > > > succeeded for blk_473953565994615
>> > > > 8302_532204
>> > > > 2009-07-17 18:49:11,213 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> Verification
>> > > > succeeded for blk_-14950858387931
>> > > > 09024_354553
>> > > >
>> > > > 09/07/17 18:02:09 WARN hdfs.DFSClient: DFSOutputStream
>> > ResponseProcessor
>> > > > exception  for block
>> > > > blk_-2536746364442878375_1403164java.net.SocketTimeoutException:
>> 63000
>> > > > millis timeout while waiting for channel to be ready for read. ch :
>> > > > java.nio.channels.SocketChannel[connected local=/192.168.1.94:54783
>> > > remote=/
>> > > > 192.168.1.97:50010]
>> > > > 11096473        09/07/17 18:02:12 INFO hdfs.DFSClient: Exception in
>> > > > createBlockOutputStream java.net.SocketTimeoutException: 63000
>> millis
>> > > > timeout while waiting for channel to be ready for read. ch :
>> > > > java.nio.channels.SocketChannel[connected local=/192.168.1.94:54790
>> > > remote=/
>> > > > 192.168.1.97:50010]
>> > > > 11096475        09/07/17 18:02:12 INFO hdfs.DFSClient: Exception in
>> > > > createBlockOutputStream java.net.SocketTimeoutException: 63000
>> millis
>> > > > timeout while waiting for channel to be ready for read. ch :
>> > > > java.nio.channels.SocketChannel[connected local=/192.168.1.94:54791
>> > > remote=/
>> > > > 192.168.1.97:50010]
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> > > www.prohadoopbook.com a community for Hadoop Professionals
>> > >
>> >
>> >
>> >
>> > --
>> > 致
>> > 礼！
>> >
>> >
>> > 王明阳
>> >
>>
>>
>>
>> --
>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> www.prohadoopbook.com a community for Hadoop Professionals
>>
>
>
>
> --
> 致
> 礼！
>
>
> 王明阳
>



-- 
致
礼！


王明阳

Re: datanode auto down

Reply via email to