Re: datanode auto down

Aaron Kimball Tue, 21 Jul 2009 19:08:40 -0700

I'd say that this is your problem right here:

Heap Size is 972.5 MB / 972.5 MB (100%)


As suspected, your very small block size + many files has completely
filled the NameNode heap. All bets are off as to what Hadoop will do
at this point.

Potential solutions:
1) increase the HADOOP_HEAPSIZE parameter in
$HADOOP_HOME/conf/hadoop-env.sh on the NameNode to give your NameNode
more space

2) increase your block size to something more reasonable. You are
running Hadoop drastically outside the normal configuration. I don't
know anyone who uses blocks less than 64MB; and certainly can't
imagine that there's too many people running it at 8 or fewer
MB/block. You may be uncovering new bugs by running this far outside
of the normal configuration. Note that increasing the configured block
size will not do anything to your existing data; you'll need to
reimport it somehow.

3) Delete a lot of files til your NameNode can cope.

- Aaron




On Mon, Jul 20, 2009 at 8:51 PM, mingyang<[email protected]> wrote:
> I have carefully examined the issue of versions to ensure that the hadoop
> are tar from the same package, on the issue of block size,   The system is
> because I do a test, when I set the block size to 64MB, the front-end apache
> can only read the document to reach 300-400kb, and I see my network traffic
> to port as much as 40-60Mb, so I trying to become a smaller file to solve
> the problem, so I through the process, I would like to store the file into
> 2mB, and then in the store, would be better to see the results.
>  In the document shredding, I encountered a problem, is to expand the number
> of documents two orders of magnitude, reaching 200 million, initially, my
> hadoop only two machines, one running datanode, a running namenode, when the
> document After more than one million, when I put data to hadoop when 10
> minutes will be initially normal, and then I will see datanode error,  
> 09/07/19
> 18:46:13 INFO hdfs.DFSClient: Waiting to find target node:
> 192.168.1.97:50010
> 09/07/19 18:46:13 INFO hdfs.DFSClient: Exception in createBlockOutputStream
> java.net.SocketTimeoutException: 66000 millis timeout while waiting for
> channel to be ready for read. Ch: java.nio.channels.SocketChannel [connected
> local = / 192.168.1.94:16153 remote = / 192.168.1.97:50010]
> 2009-07-19 18:47:21,367 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> writeBlock blk_-8134427742713358791_1418899 received exception
> java.io.EOFException: while trying to read 65557 bytes
> 2009-07-19 18:47:21,367 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 192.168.1.97:50010, storageID=DS-148204255-192.168.1.97-50010-1247630449653,
> infoPort=50075, ipcPort=50020):DataXceiver
> java.io.EOFException: while trying to read 65557 bytes
>
>
>  At the same time does not log the error namenode will wait about 10 minutes
> to receive the information datanode loss of heartbeat, the host at this time
> in the run datanode jps still can see the process of datanode, telnet 50010
> ports and at the same time, the problem seems solved, At this point, At this
> point, i run lsof can see the port seems to have been waiting forjava 9233
> root 67u IPv4 5508710 TCP master: 50010 -> web01: 10630 (ESTABLISHED)
> java 9233 root 69u IPv4 5509810 TCP master: 50010 -> web01: 10633
> (ESTABLISHED)
> I am also not clear why these occur, I can only add a 2 datanode, and to
> ensure that each file does not datanode of more than one million The
> following is my namenode log
>
>
> [r...@named logs]# free
>             total       used       free     shared    buffers     cached
> Mem:       4051552    3583876     467676          0     113360     944980
> -/+ buffers/cache:    2525536    1526016
> Swap:      4192924        120    4192804
>
> Cluster Summary***2027545 files and directories, 2008837 blocks = 4036382
> total. Heap Size is 972.5 MB / 972.5 MB (100%)
> *   Configured Capacity : 12.09 TB DFS Used : 2.88 TB Non DFS Used : 632.67
> GB DFS Remaining : 8.59 TB DFS Used% : 23.85 % DFS Remaining% : 71.03 % Live
> Nodes <http://dfsnodelist.jsp?whatNodes=LIVE> : 3 Dead
> Nodes<http://dfsnodelist.jsp?whatNodes=DEAD> :
> 0
>
> ------------------------------
> NameNode Storage:   *Storage Directory* *Type* *State*  /hadoop/name
> IMAGE_AND_EDITS Active
>
>
> 2009/7/21 Aaron Kimball <[email protected]>
>
>> A VersionMismatch occurs because you're using different builds of Hadoop on
>> your different nodes. All DataNodes and the NameNode must be running the
>> exact same compilation of Hadoop (It's very strict).
>>
>> One thing I noticed in your config is that you set dfs.block.size to
>> 2560000. That's certainly not the default value. Note that this value is in
>> bytes. That's 2.5 MB/block. If you've got a million or more files in there,
>> of any reasonable size, then you might have several million blocks.
>> Depending on how powerful your NameNode's hardware is, that could take up a
>> lot of RAM. Each block is 192 bytes, and each file is 124 bytes + the
>> filename length.
>>
>> When you run 'bin/hadoop dfsadmin -report' or look at the status JSP at
>> http://named:50070/, how much RAM free does it report? You might be using
>> up
>> a lot of the NameNode's available RAM, which could cause various problems
>> involving spending all its time in GC.
>>
>> In the future I'd recommend setting your block size 5 or 10x higher. Note
>> that this won't affect existing blocks/files though. You may need to
>> somehow
>> read and rewrite those files to take advantage of a larger block size to
>> alleviate block pressure.
>>
>> - Aaron
>>
>>
>> On Sun, Jul 19, 2009 at 3:59 PM, mingyang <[email protected]> wrote:
>>
>> > time find / b / c / d / e / f / g / h / i / j / k / l-type f-print> / dev
>> /
>> > null
>> >
>> > real 0m4.608s
>> > user 0m1.489s
>> > sys 0m3.068s
>> >
>> > I run this, the result is far less than eight minutes, is not where I set
>> > up
>> > right? I just changed the fs.default.name, other settings are default
>> > values
>> >
>> >   <name> fs.default.name </ name>
>> >   <value> hdfs: / / named: 9000 </ value>
>> >
>> >
>> >
>> >   <name> mapred.job.tracker </ name>
>> >   <value> hdfs: / / named: 9001 </ value>
>> >
>> >
>> >
>> >   <name> hadoop.tmp.dir </ name>
>> >   <value> / hadoop / tmp </ value>
>> >
>> >   <name> dfs.name.dir </ name>
>> >   <value> / hadoop / name </ value>
>> >
>> >
>> >   <name> dfs.data.dir </ name>
>> >   <value> / b, / c, / d, / e, / f, / g, / h, / i, / j, / k, / l </ value>
>> >
>> >
>> >
>> >   <name> dfs.replication </ name>
>> >   <value> 1 </ value>
>> >
>> >
>> >   <name> dfs.block.size </ name>
>> >   <value> 2560000 </ value>
>> >
>> >
>> >
>> > 2009/7/20 Jason Venner <[email protected]>
>> >
>> > > Did you run this command on the datanode that not responding?
>> > >
>> > > On Sun, Jul 19, 2009 at 3:59 AM, mingyang <[email protected]>
>> wrote:
>> > >
>> > > > in datanode logs ， I found a new error message.
>> > > > Would like to help solve the problem
>> > > >
>> > > > 2009-07-19 18:40:43,464 ERROR
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(
>> > > > 192.168.1.97:50010,
>> > > > storageID=DS-148204255-192.168.1.97-50010-1247630449653,
>> > > > infoPort=50075, ipcPort=50020):DataXceiver
>> > > > java.io.IOException: Version Mismatch
>> > > >        at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:81)
>> > > >        at java.lang.Thread.run(Thread.java:619)
>> > > > 2009-07-19 18:40:43,464 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_9198148243909908835_1418899 src: /192.168.1.94:21486 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:41:43,363 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_4464496078754951058_1418901 src: /192.168.1.105:17434 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:41:55,318 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_-998769390764365324_1418903 src: /192.168.1.105:17436 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:42:51,983 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_7357805261059813870_1418903 src: /192.168.1.94:16148 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:43:00,936 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_6766332640664853809_1418903 src: /192.168.1.105:60392 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:44:03,112 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_8833189579369772797_1418903 src: /192.168.1.94:16152 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:44:07,105 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_-92272670489114760_1418903 src: /192.168.1.94:16153 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:45:15,301 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_-3189060218922169017_1418903 src: /192.168.1.94:16157 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:45:19,299 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_2093621896162975184_1418903 src: /192.168.1.105:60400 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:46:27,602 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_-2879021713701012781_1418905 src: /192.168.1.105:60404 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:46:27,602 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>> > > > blk_356680792598474797_1418907 src: /192.168.1.105:60406 dest: /
>> > > > 192.168.1.97:50010
>> > > > 2009-07-19 18:47:21,366 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
>> > > receiveBlock
>> > > > for block blk_-8134427742713358791_1418899 java.io.EOFException:
>> while
>> > > > trying to read 65557 bytes
>> > > > 2009-07-19 18:47:21,367 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0
>> for
>> > > > block
>> > > > blk_-8134427742713358791_1418899 Interrupted.
>> > > > 2009-07-19 18:47:21,367 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0
>> for
>> > > > block
>> > > > blk_-8134427742713358791_1418899 terminating
>> > > > 2009-07-19 18:47:21,367 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> > > > blk_-8134427742713358791_1418899 received exception
>> > java.io.EOFException:
>> > > > while trying to read 65557 bytes
>> > > > 2009-07-19 18:47:21,367 ERROR
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(
>> > > > 192.168.1.97:50010,
>> > > > storageID=DS-148204255-192.168.1.97-50010-1247630449653,
>> > > > infoPort=50075, ipcPort=50020):DataXceiver
>> > > > java.io.EOFException: while trying to read 65557 bytes
>> > > >        at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:264)
>> > > >        at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:308)
>> > > >        at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:372)
>> > > >        at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:524)
>> > > >        at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
>> > > >        at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>> > > >        at java.lang.Thread.run(Thread.java:619)
>> > > > 2009-07-19 18:47:21,368 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in
>> > > receiveBlock
>> > > > for block blk_-346503319966521195_1418897 java.io.EOFException: while
>> > > > trying
>> > > > to read 65557 bytes
>> > > > 2009-07-19 18:47:21,371 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0
>> for
>> > > > block
>> > > > blk_-346503319966521195_1418897 Interrupted.
>> > > > 2009-07-19 18:47:21,373 INFO
>> > > > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0
>> for
>> > > > block
>> > > > blk_-346503319966521195_1418897 terminating
>> > > >
>> > > > 2009/7/19 mingyang <[email protected]>
>> > > >
>> > > > > But I added datanode still appear in this phenomenon, and I added a
>> > new
>> > > > > datanode, the above data only 10g
>> > > > > Strangely enough, wait about 10 minutes, hadoop will resume, I do
>> not
>> > > > know
>> > > > > what the problem is this, ask for help
>> > > > >
>> > > > >
>> > > > > it's log
>> > > > > 09/07/19 18:46:13 INFO hdfs.DFSClient: Waiting to find target node:
>> > > > > 192.168.1.97:50010
>> > > > > 09/07/19 18:46:13 INFO hdfs.DFSClient: Exception in
>> > > > createBlockOutputStream
>> > > > > java.net.SocketTimeoutException: 66000 millis timeout wh
>> > > > > ile waiting for channel to be ready for read. ch :
>> > > > > java.nio.channels.SocketChannel[connected local=/
>> 192.168.1.94:16153
>> > > > remote=/192.1
>> > > > > 68.1.97:50010]
>> > > > > 09/07/19 18:46:08 INFO hdfs.DFSClient: Waiting to find target node:
>> > > > > 192.168.1.97:50010
>> > > > > 09/07/19 18:46:08 INFO hdfs.DFSClient: Exception in
>> > > > createBlockOutputStream
>> > > > > java.net.SocketTimeoutException: 66000 millis timeout wh
>> > > > > ile waiting for channel to be ready for read. ch :
>> > > > > java.nio.channels.SocketChannel[connected local=/
>> 192.168.1.94:16152
>> > > > remote=/192.1
>> > > > > 68.1.97:50010]
>> > > > > 09/07/19 18:45:01 INFO hdfs.DFSClient: Waiting to find target node:
>> > > > > 192.168.1.105:50010
>> > > > > 09/07/19 18:45:01 INFO hdfs.DFSClient: Exception in
>> > > > createBlockOutputStream
>> > > > > java.io.IOException: Bad connect ack with firstBadLink 1
>> > > > > 92.168.1.97:50010
>> > > > > 09/07/19 18:44:56 INFO hdfs.DFSClient: Waiting to find target node:
>> > > > > 192.168.1.97:50010
>> > > > > 09/07/19 18:44:56 INFO hdfs.DFSClient: Exception in
>> > > > createBlockOutputStream
>> > > > > java.net.SocketTimeoutException: 66000 millis timeout wh
>> > > > > ile waiting for channel to be ready for read. ch :
>> > > > > java.nio.channels.SocketChannel[connected local=/
>> 192.168.1.94:16148
>> > > > remote=/192.1
>> > > > > 68.1.97:50010]
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > 2009/7/17 mingyang <[email protected]>
>> > > > >
>> > > > > I run this, the result is far less than eight minutes, is not where
>> I
>> > > set
>> > > > >> up right? I just changed the fs.default.name, other settings are
>> > > > default
>> > > > >> values
>> > > > >>    <name> fs.default.name </ name>
>> > > > >>    <value> hdfs: / / named: 9000 </ value>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>    <name> mapred.job.tracker </ name>
>> > > > >>    <value> hdfs: / / named: 9001 </ value>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>    <name> hadoop.tmp.dir </ name>
>> > > > >>    <value> / hadoop / tmp </ value>
>> > > > >>
>> > > > >>    <name> dfs.name.dir </ name>
>> > > > >>    <value> / hadoop / name </ value>
>> > > > >>
>> > > > >>
>> > > > >>    <name> dfs.data.dir </ name>
>> > > > >>    <value> / b, / c, / d, / e, / f, / g, / h, / i, / j, / k, / l
>> </
>> > > > value>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>    <name> dfs.replication </ name>
>> > > > >>    <value> 1 </ value>
>> > > > >>
>> > > > >>
>> > > > >>    <name> dfs.block.size </ name>
>> > > > >>    <value> 2560000 </ value>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> time find / b / c / d / e / f / g / h / i / j / k / l-type
>> f-print>
>> > /
>> > > > dev
>> > > > >> / null
>> > > > >>
>> > > > >> real 0m4.608s
>> > > > >> user 0m1.489s
>> > > > >> sys 0m3.068s
>> > > > >>
>> > > > >>
>> > > > >> 2009/7/17 jason hadoop <[email protected]>
>> > > > >>
>> > > > >>> From the shell,
>> > > > >>> time find /b /c /d /e /f /g /h /i /j /k /l -type f -print >
>> > /dev/null
>> > > > >>>
>> > > > >>> Also, unless your distribution is magic, white space is not
>> ignored
>> > > in
>> > > > >>> value
>> > > > >>> statements. Hopefully your actual value is
>> > > > >>> <value>/b,/c,/d,/e,/f,/g,/h,/i,/j,/k,/l</value>
>> > > > >>>
>> > > > >>> On Fri, Jul 17, 2009 at 6:29 AM, mingyang <[email protected]>
>> > > > wrote:
>> > > > >>>
>> > > > >>> > Thank you for your reply, I am novice, I do not quite
>> understand
>> > > how
>> > > > to
>> > > > >>> > check dfs.data.dir more than eight minutes?
>> > > > >>> > My settings are dfs.data.dir
>> > > > >>> > <property>
>> > > > >>> >   <name> dfs.data.dir </ name>
>> > > > >>> >   <value> / b, / c, / d, / e, / f, / g, / h, / i, / j, / k, / l
>> > </
>> > > > >>> value>
>> > > > >>> >   </ property>
>> > > > >>> >
>> > > > >>> > Each directory has a disk mount
>> > > > >>> >
>> > > > >>> > 2009/7/17 jason hadoop <[email protected]>
>> > > > >>> >
>> > > > >>> > > Most likely problem is that the block report is taking more
>> > than
>> > > 10
>> > > > >>> > > minutes.
>> > > > >>> > > Due to the placement of the sync blocks in the core Datanode
>> > > code,
>> > > > >>> the
>> > > > >>> > > block
>> > > > >>> > > report locks out the heartbeat.
>> > > > >>> > > This can cause the namenode to think the datanode has
>> vanished.
>> > > > >>> > >
>> > > > >>> > > A simple way to check, is to run a find on the directory set
>> > > > >>> specified
>> > > > >>> > for
>> > > > >>> > > dfs.data.dir. If this find takes more than 8 minutes or so,
>> you
>> > > are
>> > > > >>> in
>> > > > >>> > > trouble.
>> > > > >>> > > The only solutions are to add more datanodes, reducing the
>> > block
>> > > > >>> count,
>> > > > >>> > or
>> > > > >>> > > increase your system io speed so that the block report may
>> > > complete
>> > > > >>> in
>> > > > >>> > > time.
>> > > > >>> > >
>> > > > >>> > > On Fri, Jul 17, 2009 at 6:12 AM, mingyang <
>> > [email protected]>
>> > > > >>> wrote:
>> > > > >>> > >
>> > > > >>> > > > i using hadoop storage my media files，, but when the number
>> > of
>> > > > >>> > documents
>> > > > >>> > > > when more than one million,
>> > > > >>> > > > Hadoop start about 10-20 minutes, my datanode automatically
>> > > down,
>> > > > >>> > > > namenode log shows that the loss of heart, but I see my
>> > normal
>> > > > >>> > datanode,
>> > > > >>> > > > port 50010 can be a normal telnet, use jps to see can see
>> > > > datanode
>> > > > >>> > still
>> > > > >>> > > > running, but at this time have been unable to put data to a
>> > > > hadoop,
>> > > > >>> I
>> > > > >>> > > guess
>> > > > >>> > > > datanode services is dead, hadoop does not support more
>> than
>> > > one
>> > > > >>> > million
>> > > > >>> > > > documents? How do I adjust those parameters? I have already
>> > set
>> > > > up
>> > > > >>> at
>> > > > >>> > the
>> > > > >>> > > > same time the number of open file 65535
>> > > > >>> > > >
>> > > > >>> > > >
>> > > > >>> > > > namenode log
>> > > > >>> > > >
>> > > > >>> > > > 2009-07-17 18:14:29,330 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
>> > > > >>> > > >
>> > > > >>> > > > ugi=root,root,bin,daemon,sys,adm,disk,wheelip=/
>> 192.168.1.96
>> > > > >>> > > > cmd=setPermission
>> > > > >>> > > >
>> > > > >>> > > > src=/hadoop/tmp/mapred/system/jobtracker.info   dst=null
>> > > > >>> > > > perm=root:supergroup
>> > > > >>> > > > :rw-------
>> > > > >>> > > > 2009-07-17 18:14:29,336 INFO
>> > > org.apache.hadoop.hdfs.StateChange:
>> > > > >>> BLOCK*
>> > > > >>> > > > NameSystem.allocateBlock:
>> > > > >>> > > >
>> > > > >>> > > > /hadoop/tmp/mapred/system/jobtrack
>> > > > >>> > > > er.info. blk_-2148480138731090754_1403179
>> > > > >>> > > > 2009-07-17 18:14:32,958 INFO
>> > > org.apache.hadoop.hdfs.StateChange:
>> > > > >>> BLOCK*
>> > > > >>> > > > NameSystem.addStoredBlock: blockMap updated:
>> > > > >>> > > >
>> > > > >>> > > > 192.168.1.97:50
>> > > > >>> > > > 010 is added to blk_-2148480138731090754_1403179 size 4
>> > > > >>> > > > 2009-07-17 18:14:33,340 INFO
>> > > org.apache.hadoop.hdfs.StateChange:
>> > > > >>> DIR*
>> > > > >>> > > > NameSystem.completeFile: file
>> > > > >>> > > >
>> > > > >>> > > > /hadoop/tmp/mapred/system/jobtra
>> > > > >>> > > > cker.info is closed by DFSClient_1037557306
>> > > > >>> > > > 2009-07-17 18:16:21,349 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll
>> > Edit
>> > > > Log
>> > > > >>> from
>> > > > >>> > > > 192.168.1.96
>> > > > >>> > > > 2009-07-17 18:16:21,349 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number
>> > of
>> > > > >>> > > > transactions:
>> > > > >>> > > > 7 Total time for
>> > > > >>> > > >
>> > > > >>> > > > transacti
>> > > > >>> > > > ons(ms): 1Number of transactions batched in Syncs: 1 Number
>> > of
>> > > > >>> syncs: 6
>> > > > >>> > > > SyncTimes(ms): 9
>> > > > >>> > > > 2009-07-17 18:17:12,171 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll
>> > > FSImage
>> > > > >>> from
>> > > > >>> > > > 192.168.1.96
>> > > > >>> > > > 2009-07-17 18:17:12,171 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number
>> > of
>> > > > >>> > > > transactions:
>> > > > >>> > > > 0 Total time for
>> > > > >>> > > >
>> > > > >>> > > > transacti
>> > > > >>> > > > ons(ms): 0Number of transactions batched in Syncs: 0 Number
>> > of
>> > > > >>> syncs: 1
>> > > > >>> > > > SyncTimes(ms): 0
>> > > > >>> > > > 2009-07-17 18:51:00,566 INFO
>> > > org.apache.hadoop.hdfs.StateChange:
>> > > > >>> BLOCK*
>> > > > >>> > > > NameSystem.heartbeatCheck: lost heartbeat from
>> > > > >>> > > >
>> > > > >>> > > > 192.168.1.97:
>> > > > >>> > > > 50010
>> > > > >>> > > > 2009-07-17 18:51:25,383 INFO
>> > > > org.apache.hadoop.net.NetworkTopology:
>> > > > >>> > > > Removing
>> > > > >>> > > > a node: /default-rack/192.168.1.97:50010
>> > > > >>> > > > 2009-07-17 19:10:48,564 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease
>> > > > [Lease.
>> > > > >>> > >  Holder:
>> > > > >>> > > > DFSClient_-
>> > > > >>> > > >
>> > > > >>> > > > 1624377199, pend
>> > > > >>> > > > ingcreates: 69] has expired hard limit
>> > > > >>> > > > 2009-07-17 19:10:48,564 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> > Recovering
>> > > > >>> > > > lease=[Lease.  Holder:
>> > > > >>> > > >
>> > > > >>> > > > DFSClient_-16243
>> > > > >>> > > > 77199, pendingcreates: 69],
>> > > > >>> > > >
>> > > > >>>
>> > > src=/unp/01/video/B3/94/{B394EDB2-0302-34B9-5357-4904FFFEFF36}_100.unp
>> > > > >>> > > >
>> > > > >>> > > > datanode log
>> > > > >>> > > >
>> > > > >>> > > > 2009-07-17 18:52:40,719 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> > > > >>> Verification
>> > > > >>> > > > succeeded for blk_664105388033641
>> > > > >>> > > > 1514_601647
>> > > > >>> > > > 2009-07-17 18:52:12,421 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> > > > >>> Verification
>> > > > >>> > > > succeeded for blk_-10747966535898
>> > > > >>> > > > 19594_1392025
>> > > > >>> > > > 2009-07-17 18:51:44,074 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> > > > >>> Verification
>> > > > >>> > > > succeeded for blk_-63504301593802
>> > > > >>> > > > 31402_155334
>> > > > >>> > > > 2009-07-17 18:51:12,760 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> > > > >>> Verification
>> > > > >>> > > > succeeded for blk_460729998775184
>> > > > >>> > > > 5359_395290
>> > > > >>> > > > 2009-07-17 18:50:39,977 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> > > > >>> Verification
>> > > > >>> > > > succeeded for blk_802918354954113
>> > > > >>> > > > 9011_474989
>> > > > >>> > > > 2009-07-17 18:50:11,707 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> > > > >>> Verification
>> > > > >>> > > > succeeded for blk_846865664811904
>> > > > >>> > > > 9754_1065465
>> > > > >>> > > > 2009-07-17 18:49:39,421 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> > > > >>> Verification
>> > > > >>> > > > succeeded for blk_473953565994615
>> > > > >>> > > > 8302_532204
>> > > > >>> > > > 2009-07-17 18:49:11,213 INFO
>> > > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
>> > > > >>> Verification
>> > > > >>> > > > succeeded for blk_-14950858387931
>> > > > >>> > > > 09024_354553
>> > > > >>> > > >
>> > > > >>> > > > 09/07/17 18:02:09 WARN hdfs.DFSClient: DFSOutputStream
>> > > > >>> > ResponseProcessor
>> > > > >>> > > > exception  for block
>> > > > >>> > > >
>> > > blk_-2536746364442878375_1403164java.net.SocketTimeoutException:
>> > > > >>> 63000
>> > > > >>> > > > millis timeout while waiting for channel to be ready for
>> > read.
>> > > ch
>> > > > :
>> > > > >>> > > > java.nio.channels.SocketChannel[connected local=/
>> > > > >>> 192.168.1.94:54783
>> > > > >>> > > remote=/
>> > > > >>> > > > 192.168.1.97:50010]
>> > > > >>> > > > 11096473        09/07/17 18:02:12 INFO hdfs.DFSClient:
>> > > Exception
>> > > > in
>> > > > >>> > > > createBlockOutputStream java.net.SocketTimeoutException:
>> > 63000
>> > > > >>> millis
>> > > > >>> > > > timeout while waiting for channel to be ready for read. ch
>> :
>> > > > >>> > > > java.nio.channels.SocketChannel[connected local=/
>> > > > >>> 192.168.1.94:54790
>> > > > >>> > > remote=/
>> > > > >>> > > > 192.168.1.97:50010]
>> > > > >>> > > > 11096475        09/07/17 18:02:12 INFO hdfs.DFSClient:
>> > > Exception
>> > > > in
>> > > > >>> > > > createBlockOutputStream java.net.SocketTimeoutException:
>> > 63000
>> > > > >>> millis
>> > > > >>> > > > timeout while waiting for channel to be ready for read. ch
>> :
>> > > > >>> > > > java.nio.channels.SocketChannel[connected local=/
>> > > > >>> 192.168.1.94:54791
>> > > > >>> > > remote=/
>> > > > >>> > > > 192.168.1.97:50010]
>> > > > >>> > > >
>> > > > >>> > >
>> > > > >>> > >
>> > > > >>> > >
>> > > > >>> > > --
>> > > > >>> > > Pro Hadoop, a book to guide you from beginner to hadoop
>> > mastery,
>> > > > >>> > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> > > > >>> > > www.prohadoopbook.com a community for Hadoop Professionals
>> > > > >>> > >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > --
>> > > > >>> > 致
>> > > > >>> > 礼！
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > 王明阳
>> > > > >>> >
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> --
>> > > > >>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> > > > >>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> > > > >>> www.prohadoopbook.com a community for Hadoop Professionals
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> 致
>> > > > >> 礼！
>> > > > >>
>> > > > >>
>> > > > >> 王明阳
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > 致
>> > > > > 礼！
>> > > > >
>> > > > >
>> > > > > 王明阳
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > 致
>> > > > 礼！
>> > > >
>> > > >
>> > > > 王明阳
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> > > http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> > > www.prohadoopbook.com a community for Hadoop Professionals
>> > >
>> >
>> >
>> >
>> > --
>> > 致
>> > 礼！
>> >
>> >
>> > 王明阳
>> >
>>
>
>
>
> --
> 致
> 礼！
>
>
> 王明阳
>

Re: datanode auto down

Reply via email to