[
https://issues.apache.org/jira/browse/HADOOP-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631508#action_12631508
]
sam rash commented on HADOOP-4180:
----------------------------------
I correlated this with an error at job submission:
this is another exception we get client-side that appears to be happening more
consistently now (and correlates with a higher frequency of the "Got exception
while reading blk_....".
WARN 2008-09-16 18:52:41,977 DataStreamer for file
/mapred/system/job_200809052004_4597/job.jar block blk_-7411787837747554790
org.apache.hadoop.dfs.DFSClient DataStreamer Exception:
java.net.SocketException: Broken pipe
WARN 2008-09-16 18:52:41,977 DataStreamer for file
/mapred/system/job_200809052004_4597/job.jar block blk_-7411787837747554790
org.apache.hadoop.dfs.DFSClient Error Recovery for block
blk_-7411787837747554790 bad datanode[0] 10.16.160.81:50010
WARN 2008-09-16 18:52:41,978 DataStreamer for file
/mapred/system/job_200809052004_4597/job.jar block blk_-7411787837747554790
org.apache.hadoop.dfs.DFSClient Error Recovery for block
blk_-7411787837747554790 in pipeline 10.16.160.81:50010, 10.16.160.47:50010,
10.16.160.48:50010: bad datanode 10.16.160.81:50010
I then looked in the server logs and in fact the exception existed:
2008-09-16 18:54:07,310 INFO org.apache.hadoop.dfs.DataNode: Receiving block
blk_-7411787837747554790 src: /10.16.160.102:36057 dest: /10.16.160.102:50010
2008-09-16 18:54:09,062 INFO org.apache.hadoop.dfs.DataNode: Received block
blk_-7411787837747554790 src: /10.16.160.102:36057 dest: /10.16.160.102:50010
of siz
e 36564037
2008-09-16 18:54:14,388 INFO org.apache.hadoop.dfs.DataNode: Deleting block
blk_-7411787837747554790 file
/local/data/hadoop/d1/dfs/data/current/subdir36/subdir
14/blk_-7411787837747554790
...
2008-09-16 18:54:49,955 WARN org.apache.hadoop.dfs.DataNode:
10.16.160.98:50010:Got exception while serving blk_-7411787837747554790 to
/10.16.160.104:
java.io.IOException: Block blk_-7411787837747554790 is not valid.
at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
at org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
at java.lang.Thread.run(Thread.java:619)
> DataNode exceptions reading local disk
> --------------------------------------
>
> Key: HADOOP-4180
> URL: https://issues.apache.org/jira/browse/HADOOP-4180
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.16.4
> Reporter: sam rash
>
> We get 100s of exceptions at WARN level per day indicating errors while
> trying to read local blocks. When this occurs, I've checked on the local
> box's dfs.data.dir and the block is not present. Here is a relevant snippet
> from the logs regarding the missing block. It *looks* like the DataNode
> deletes the block and then tries to read it again later.
> NOTE: this is for the jar file as up to 8 hosts have this exception for one
> block and our data repl factor is only 3.
> 2008-09-16 00:27:05,232 INFO org.apache.hadoop.dfs.DataNode: Receiving block
> blk_-4689026510986275010 src: /10.16.160.78:40905 dest: /10.16.160.78:50010
> 2008-09-16 00:27:06,987 INFO org.apache.hadoop.dfs.DataNode: Received block
> blk_-4689026510986275010 src: /10.16.160.78:40905 dest: /10.16.160.78:50010
> of size 36431585
> 2008-09-16 00:27:08,440 INFO org.apache.hadoop.dfs.DataNode: Receiving block
> blk_-4002794066491997463 src: /10.16.160.69:35567 dest: /10.16.160.69:50010
> 2008-09-16 00:27:08,830 WARN org.apache.hadoop.dfs.DataNode: Unexpected error
> trying to delete block blk_7614043800226105600. Block not found in blockMap.
> 2008-09-16 00:27:08,838 WARN org.apache.hadoop.dfs.DataNode:
> java.io.IOException: Error in deleting blocks.
> at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:859)
> at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:718)
> at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:597)
> at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2431)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:27:10,251 INFO org.apache.hadoop.dfs.DataNode: Received block
> blk_-4002794066491997463 src: /10.16.160.69:35567 dest: /10.16.160.69:50010
> of size 36431585
> 2008-09-16 00:27:11,844 INFO org.apache.hadoop.dfs.DataNode: Deleting block
> blk_-4689026510986275010 file
> /local/data/hadoop/d0/dfs/data/current/subdir33/subdir38/blk_-4689026510986275010
> 2008-09-16 00:27:15,819 INFO org.apache.hadoop.dfs.DataNode: Receiving block
> blk_-5880042835256853020 src: /10.16.160.108:50565 dest: /10.16.160.108:50010
> 2008-09-16 00:27:17,623 INFO org.apache.hadoop.dfs.DataNode: Received block
> blk_-5880042835256853020 src: /10.16.160.108:50565 dest: /10.16.160.108:50010
> of size 36431585
> 2008-09-16 00:27:17,846 INFO org.apache.hadoop.dfs.DataNode: Deleting block
> blk_-4002794066491997463 file
> /local/data/hadoop/d1/dfs/data/current/subdir17/subdir50/blk_-4002794066491997463
> 2008-09-16 00:27:23,848 INFO org.apache.hadoop.dfs.DataNode: Deleting block
> blk_-5880042835256853020 file
> /local/data/hadoop/d0/dfs/data/current/subdir33/subdir38/blk_-5880042835256853020
> 2008-09-16 00:27:43,661 INFO org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010 Served block blk_-2865745407489352337 to /10.16.160.49
> 2008-09-16 00:27:49,435 INFO org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010 Served block blk_6570557978733593293 to /10.16.160.49
> 2008-09-16 00:27:53,676 INFO org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010 Served block blk_8435126910829255466 to /10.16.160.49
> 2008-09-16 00:27:54,541 INFO org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010 Served block blk_-4299511117827744453 to /10.16.160.49
> 2008-09-16 00:28:00,441 INFO org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010 Served block blk_3186947853142559620 to /10.16.160.49
> 2008-09-16 00:28:05,229 INFO org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010 Served block blk_-6450608467879552280 to /10.16.160.49
> 2008-09-16 00:28:09,301 INFO org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010 Served block blk_6030697901457602946 to /10.16.160.113
> 2008-09-16 00:28:10,470 INFO org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010 Served block blk_4663928918404305287 to /10.16.160.36
> 2008-09-16 00:28:10,537 WARN org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010:Got exception while serving blk_-4689026510986275010 to
> /10.16.160.33:
> java.io.IOException: Block blk_-4689026510986275010 is not valid.
> at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:10,538 ERROR org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010:DataXceiver: java.io.IOException: Block
> blk_-4689026510986275010 is not valid.
> at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:11,255 WARN org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010:Got exception while serving blk_-4689026510986275010 to
> /10.16.160.108:
> java.io.IOException: Block blk_-4689026510986275010 is not valid.
> at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:11,255 ERROR org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010:DataXceiver: java.io.IOException: Block
> blk_-4689026510986275010 is not valid.
> at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:12,973 WARN org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010:Got exception while serving blk_-4689026510986275010 to
> /10.16.160.49:
> java.io.IOException: Block blk_-4689026510986275010 is not valid.
> at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:12,973 ERROR org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010:DataXceiver: java.io.IOException: Block
> blk_-4689026510986275010 is not valid.
> at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:14,647 WARN org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010:Got exception while serving blk_-4002794066491997463 to
> /10.16.160.79:
> java.io.IOException: Block blk_-4002794066491997463 is not valid.
> at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:14,647 ERROR org.apache.hadoop.dfs.DataNode:
> 10.16.160.49:50010:DataXceiver: java.io.IOException: Block
> blk_-4002794066491997463 is not valid.
> at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
> at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:50,873 WARN org.apache.hadoop.dfs.DataNode: Unexpected error
> trying to delete block blk_-4689026510986275010. Block not found in blockMap.
> 2008-09-16 00:28:50,880 WARN org.apache.hadoop.dfs.DataNode:
> java.io.IOException: Error in deleting blocks.
> at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:859)
> at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:718)
> at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:597)
> at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2431)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.