[ 
https://issues.apache.org/jira/browse/HADOOP-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631508#action_12631508
 ] 

sam rash commented on HADOOP-4180:
----------------------------------

I correlated this with an error at job submission:

this is another exception we get client-side that appears to be happening more 
consistently now (and correlates with a higher frequency of the "Got exception 
while reading blk_....".


WARN 2008-09-16 18:52:41,977 DataStreamer for file 
/mapred/system/job_200809052004_4597/job.jar block blk_-7411787837747554790 
org.apache.hadoop.dfs.DFSClient DataStreamer Exception: 
java.net.SocketException: Broken pipe
WARN 2008-09-16 18:52:41,977 DataStreamer for file 
/mapred/system/job_200809052004_4597/job.jar block blk_-7411787837747554790 
org.apache.hadoop.dfs.DFSClient Error Recovery for block 
blk_-7411787837747554790 bad datanode[0] 10.16.160.81:50010
WARN 2008-09-16 18:52:41,978 DataStreamer for file 
/mapred/system/job_200809052004_4597/job.jar block blk_-7411787837747554790 
org.apache.hadoop.dfs.DFSClient Error Recovery for block 
blk_-7411787837747554790 in pipeline 10.16.160.81:50010, 10.16.160.47:50010, 
10.16.160.48:50010: bad datanode 10.16.160.81:50010

I then looked in the server logs and in fact the exception existed:

2008-09-16 18:54:07,310 INFO org.apache.hadoop.dfs.DataNode: Receiving block 
blk_-7411787837747554790 src: /10.16.160.102:36057 dest: /10.16.160.102:50010
2008-09-16 18:54:09,062 INFO org.apache.hadoop.dfs.DataNode: Received block 
blk_-7411787837747554790 src: /10.16.160.102:36057 dest: /10.16.160.102:50010 
of siz
e 36564037
2008-09-16 18:54:14,388 INFO org.apache.hadoop.dfs.DataNode: Deleting block 
blk_-7411787837747554790 file 
/local/data/hadoop/d1/dfs/data/current/subdir36/subdir
14/blk_-7411787837747554790
...
2008-09-16 18:54:49,955 WARN org.apache.hadoop.dfs.DataNode: 
10.16.160.98:50010:Got exception while serving blk_-7411787837747554790 to 
/10.16.160.104:
java.io.IOException: Block blk_-7411787837747554790 is not valid.
        at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
        at org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
        at 
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
        at java.lang.Thread.run(Thread.java:619)

> DataNode exceptions reading local disk
> --------------------------------------
>
>                 Key: HADOOP-4180
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4180
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.4
>            Reporter: sam rash
>
> We get 100s of exceptions at WARN level per day indicating errors while 
> trying to read local blocks.  When this occurs, I've checked on the local 
> box's dfs.data.dir and the block is not present.  Here is a relevant snippet 
> from the logs regarding the missing block.  It *looks* like the DataNode 
> deletes the block and then tries to read it again later.
> NOTE: this is for the jar file as up to 8 hosts have this exception for one 
> block and our data repl factor is only 3.
> 2008-09-16 00:27:05,232 INFO org.apache.hadoop.dfs.DataNode: Receiving block 
> blk_-4689026510986275010 src: /10.16.160.78:40905 dest: /10.16.160.78:50010
> 2008-09-16 00:27:06,987 INFO org.apache.hadoop.dfs.DataNode: Received block 
> blk_-4689026510986275010 src: /10.16.160.78:40905 dest: /10.16.160.78:50010 
> of size 36431585
> 2008-09-16 00:27:08,440 INFO org.apache.hadoop.dfs.DataNode: Receiving block 
> blk_-4002794066491997463 src: /10.16.160.69:35567 dest: /10.16.160.69:50010
> 2008-09-16 00:27:08,830 WARN org.apache.hadoop.dfs.DataNode: Unexpected error 
> trying to delete block blk_7614043800226105600. Block not found in blockMap. 
> 2008-09-16 00:27:08,838 WARN org.apache.hadoop.dfs.DataNode: 
> java.io.IOException: Error in deleting blocks.
>         at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:859)
>         at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:718)
>         at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:597)
>         at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2431)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:27:10,251 INFO org.apache.hadoop.dfs.DataNode: Received block 
> blk_-4002794066491997463 src: /10.16.160.69:35567 dest: /10.16.160.69:50010 
> of size 36431585
> 2008-09-16 00:27:11,844 INFO org.apache.hadoop.dfs.DataNode: Deleting block 
> blk_-4689026510986275010 file 
> /local/data/hadoop/d0/dfs/data/current/subdir33/subdir38/blk_-4689026510986275010
> 2008-09-16 00:27:15,819 INFO org.apache.hadoop.dfs.DataNode: Receiving block 
> blk_-5880042835256853020 src: /10.16.160.108:50565 dest: /10.16.160.108:50010
> 2008-09-16 00:27:17,623 INFO org.apache.hadoop.dfs.DataNode: Received block 
> blk_-5880042835256853020 src: /10.16.160.108:50565 dest: /10.16.160.108:50010 
> of size 36431585
> 2008-09-16 00:27:17,846 INFO org.apache.hadoop.dfs.DataNode: Deleting block 
> blk_-4002794066491997463 file 
> /local/data/hadoop/d1/dfs/data/current/subdir17/subdir50/blk_-4002794066491997463
> 2008-09-16 00:27:23,848 INFO org.apache.hadoop.dfs.DataNode: Deleting block 
> blk_-5880042835256853020 file 
> /local/data/hadoop/d0/dfs/data/current/subdir33/subdir38/blk_-5880042835256853020
> 2008-09-16 00:27:43,661 INFO org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010 Served block blk_-2865745407489352337 to /10.16.160.49
> 2008-09-16 00:27:49,435 INFO org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010 Served block blk_6570557978733593293 to /10.16.160.49
> 2008-09-16 00:27:53,676 INFO org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010 Served block blk_8435126910829255466 to /10.16.160.49
> 2008-09-16 00:27:54,541 INFO org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010 Served block blk_-4299511117827744453 to /10.16.160.49
> 2008-09-16 00:28:00,441 INFO org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010 Served block blk_3186947853142559620 to /10.16.160.49
> 2008-09-16 00:28:05,229 INFO org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010 Served block blk_-6450608467879552280 to /10.16.160.49
> 2008-09-16 00:28:09,301 INFO org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010 Served block blk_6030697901457602946 to /10.16.160.113
> 2008-09-16 00:28:10,470 INFO org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010 Served block blk_4663928918404305287 to /10.16.160.36
> 2008-09-16 00:28:10,537 WARN org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010:Got exception while serving blk_-4689026510986275010 to 
> /10.16.160.33:
> java.io.IOException: Block blk_-4689026510986275010 is not valid.
>         at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:10,538 ERROR org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010:DataXceiver: java.io.IOException: Block 
> blk_-4689026510986275010 is not valid.
>         at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:11,255 WARN org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010:Got exception while serving blk_-4689026510986275010 to 
> /10.16.160.108:
> java.io.IOException: Block blk_-4689026510986275010 is not valid.
>         at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:11,255 ERROR org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010:DataXceiver: java.io.IOException: Block 
> blk_-4689026510986275010 is not valid.
>         at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:12,973 WARN org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010:Got exception while serving blk_-4689026510986275010 to 
> /10.16.160.49:
> java.io.IOException: Block blk_-4689026510986275010 is not valid.
>         at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:12,973 ERROR org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010:DataXceiver: java.io.IOException: Block 
> blk_-4689026510986275010 is not valid.
>         at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:14,647 WARN org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010:Got exception while serving blk_-4002794066491997463 to 
> /10.16.160.79:
> java.io.IOException: Block blk_-4002794066491997463 is not valid.
>         at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:14,647 ERROR org.apache.hadoop.dfs.DataNode: 
> 10.16.160.49:50010:DataXceiver: java.io.IOException: Block 
> blk_-4002794066491997463 is not valid.
>         at org.apache.hadoop.dfs.FSDataset.getLength(FSDataset.java:567)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.<init>(DataNode.java:1466)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:992)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:930)
>         at java.lang.Thread.run(Thread.java:619)
> 2008-09-16 00:28:50,873 WARN org.apache.hadoop.dfs.DataNode: Unexpected error 
> trying to delete block blk_-4689026510986275010. Block not found in blockMap. 
> 2008-09-16 00:28:50,880 WARN org.apache.hadoop.dfs.DataNode: 
> java.io.IOException: Error in deleting blocks.
>         at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:859)
>         at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:718)
>         at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:597)
>         at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2431)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to