[
https://issues.apache.org/jira/browse/HADOOP-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611465#action_12611465
]
rangadi edited comment on HADOOP-3707 at 7/7/08 11:05 PM:
---------------------------------------------------------------
Proposal that Hairong and I discussed : (this seems safe enough for 0.17 and
0.18) :
* Each datanode maintains a counter {{approxBlocksSheduled}}.
** incremented each time a block is scheduled to a datanode
** decremented when datanode receives 'block received' message from a datanode.
** No list of block ids is maintained.
** Not every block scheduled will eventually receive a 'block received'
message. it will be corrected over time, as described later below.
* disk space left on datanode will be ({{freespace_reported_in_heartbeat -
(approxBlocksScheduled+prevApproxBlockScheduled)*defaultBlockSize}})
* 'approx' in the name of the variable is deliberate since it is not expected
to be very accurate. It will be handled like this :
** another variable 'prevApproxBlocksScheduled' is maintained.
** Every 5 minutes or so, value of 'prev' will be ignored. 'prev' will be set
to current value and current will be set to zero.
** So if there are some blocks that are not reported back by the datanode, they
will eventually get adjusted (usually 10 min; bit longer if datanode is
continuously receiving blocks).
** Its not an error if NameNode receives 'block received' message and this
counter is zero.
* This count will also be useful for throttling number of blocks scheduled for
replication.. (may be the limit could be something large like 50 or 100).
was (Author: rangadi):
Proposal that Hairong and I discussed : (this seems safe enough for 0.17
and 0.18) :
* Each datanode maintains a counter {{approxBlocksSheduled}}.
** incremented each time a block is scheduled to a datanode
** decremented when datanode receives 'block received' message from a datanode.
** No list of block ids is maintained.
** Not every block scheduled will eventually receive a 'block received'
message. it will be corrected over time, as described later below.
* disk space left on datanode will be ({{freespace_reported_in_heartbeat -
approxBlocksScheduled*defaultBlockSize}})
* 'approx' in the name of the variable is deliberate since it is not expected
to be very accurate. It will be handled like this :
** another variable 'prevApproxBlocksScheduled' is maintained.
** Every 5 minutes or so, value of 'prev' will be ignored. 'prev' will be set
to current value and current will be set to zero.
** So if there are some blocks that are not reported back by the datanode, they
will get adjusted in 10 min.
** Its not an error if NameNode receives 'block received' message and this
counter is zero.
* This count will also be useful for throttling number of blocks scheduled for
replication.. (may be the limit could be something large like 50 or 100).
> Frequent DiskOutOfSpaceException on almost-full datanodes
> ---------------------------------------------------------
>
> Key: HADOOP-3707
> URL: https://issues.apache.org/jira/browse/HADOOP-3707
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.17.0
> Reporter: Koji Noguchi
>
> On a datanode which is completely full (leaving reserve space), we
> frequently see
> target node reporting,
> {noformat}
> 2008-07-07 16:54:44,707 INFO org.apache.hadoop.dfs.DataNode: Receiving block
> blk_3328886742742952100 src: /11.1.11.111:22222 dest: /11.1.11.111:22222
> 2008-07-07 16:54:44,708 INFO org.apache.hadoop.dfs.DataNode: writeBlock
> blk_3328886742742952100 received exception
> org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient
> space for an additional block
> 2008-07-07 16:54:44,708 ERROR org.apache.hadoop.dfs.DataNode:
> 33.3.33.33:22222:DataXceiver:
> org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Insufficient
> space for an additional block
> at
> org.apache.hadoop.dfs.FSDataset$FSVolumeSet.getNextVolume(FSDataset.java:444)
> at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:716)
> at
> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:2187)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1113)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976)
> at java.lang.Thread.run(Thread.java:619)
> {noformat}
> Sender reporting
> {noformat}
> 2008-07-07 16:54:44,712 INFO org.apache.hadoop.dfs.DataNode:
> 11.1.11.111:22222:Exception writing block blk_3328886742742952100 to mirror
> 33.3.33.33:22222
> java.io.IOException: Broken pipe
> at sun.nio.ch.FileDispatcher.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
> at sun.nio.ch.IOUtil.write(IOUtil.java:75)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
> at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:2292)
> at
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receivePacket(DataNode.java:2411)
> at
> org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:2476)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1204)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:976)
> at java.lang.Thread.run(Thread.java:619)
> {noformat}
> Since it's not constantly happening, my guess is whenever datanode gets some
> small space available, namenode over-assigns blocks which can fail the block
> pipeline.
> (Note, before 0.17, namenode was much slower in assigning blocks)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.