Hello, 

I'm working with Hadoop 0.16.1. I have an issue with the DFS. Sometimes
when writing to the HDFS it gets blocked. Sometimes it doesn't happen,
so it's not easily reproducible. 

My cluster have 4 nodes and one master with the NameNode and JobTracker.
This are the logs that appears when all gets blocked. Look to the block
blk_7857709233639057851 that seems to be the problematic one. It raises
the exception:

"Exception in receiveBlock for block  java.io.IOException: Trying to
change block file offset of block blk_7857709233639057851 to 33357824
but actual size of file is 33353728"

A bigger trace of the logs and a part of the stack trace:

hn3: 2008-03-28 07:34:44,499 INFO org.apache.hadoop.dfs.DataNode:
Receiving block blk_7857709233639057851 src: /172.16.3.2:46092
dest: /172.16.3.2:50010
hn3: 2008-03-28 07:34:44,501 INFO org.apache.hadoop.dfs.DataNode:
Datanode 2 got response for connect ack  from downstream datanode with
firstbadlink as 
hn3: 2008-03-28 07:34:44,501 INFO org.apache.hadoop.dfs.DataNode:
Datanode 2 forwarding connect ack to upstream firstbadlink is 
hn2: 2008-03-28 07:34:44,496 INFO org.apache.hadoop.dfs.DataNode:
Received block blk_8152094109584962620 of size 67108864 from /172.16.3.2
hn2: 2008-03-28 07:34:44,496 INFO org.apache.hadoop.dfs.DataNode:
PacketResponder 2 for block blk_8152094109584962620 terminating
hn2: 2008-03-28 07:34:44,500 INFO org.apache.hadoop.dfs.DataNode:
Receiving block blk_7857709233639057851 src: /172.16.3.5:35904
dest: /172.16.3.5:50010
hn2: 2008-03-28 07:34:44,502 INFO org.apache.hadoop.dfs.DataNode:
Datanode 1 got response for connect ack  from downstream datanode with
firstbadlink as 
hn2: 2008-03-28 07:34:44,502 INFO org.apache.hadoop.dfs.DataNode:
Datanode 1 forwarding connect ack to upstream firstbadlink is 
hn1: 2008-03-28 07:34:44,495 INFO org.apache.hadoop.dfs.DataNode:
Received block blk_8152094109584962620 of size 67108864 from /172.16.3.4
hn1: 2008-03-28 07:34:44,495 INFO org.apache.hadoop.dfs.DataNode:
PacketResponder 1 for block blk_8152094109584962620 terminating
hn4: 2008-03-28 07:34:44,501 INFO org.apache.hadoop.dfs.DataNode:
Receiving block blk_7857709233639057851 src: /172.16.3.4:36887
dest: /172.16.3.4:50010
hn4: 2008-03-28 07:34:44,501 INFO org.apache.hadoop.dfs.DataNode:
Datanode 0 forwarding connect ack to upstream firstbadlink is 
hn4: 2008-03-28 07:34:44,615 INFO org.apache.hadoop.dfs.DataNode:
Changing block file offset of block blk_7857709233639057851 from 4325376
to 4325376 meta file offset to 33799
hn3: 2008-03-28 07:34:45,304 INFO org.apache.hadoop.dfs.DataNode:
Changing block file offset of block blk_7857709233639057851 from
33353728 to 33357824 meta file offset to 260615
hn3: 2008-03-28 07:34:45,305 INFO org.apache.hadoop.dfs.DataNode:
Exception in receiveBlock for block  java.io.IOException: Trying to
change block file offset of block blk_7857709233639057851 to 33357824
but actual size of file is 33353728
hn1: 2008-03-28 07:35:31,835 INFO org.apache.hadoop.dfs.DataNode:
BlockReport of 564 blocks got processed in 128 msecs

Full thread dump Java HotSpot(TM) 64-Bit Server VM (10.0-b19 mixed
mode):

"ResponseProcessor for block blk_7857709233639057851" prio=10
tid=0x000000005c557800 nid=0x23ad waiting for monitor entry
[0x0000000040e15000..0x0000000040e15a10]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream
$ResponseProcessor.run(DFSClient.java:1771)
        - waiting to lock <0x00002aaab43ad910> (a java.util.LinkedList)

"DataStreamer for file /user/properazzi/test/output/index/_0.cfs block
blk_7857709233639057851" prio=10 tid=0x000000005c59f000 nid=0x2392
runnable [0x0000000041219000..0x0000000041219d10]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at
java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        - locked <0x00002aaade9b8120> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        - locked <0x00002aaade9b8148> (a java.io.DataOutputStream)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream
$DataStreamer.run(DFSClient.java:1623)
        - locked <0x00002aaab43ad910> (a java.util.LinkedList)

"[EMAIL PROTECTED]" daemon prio=10
tid=0x000000005c7f1000 nid=0x2254 waiting on condition
[0x0000000041118000..0x0000000041118a90]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.dfs.DFSClient
$LeaseChecker.run(DFSClient.java:597)
        at java.lang.Thread.run(Thread.java:619)

"[EMAIL PROTECTED]" daemon prio=10
tid=0x000000005c4fec00 nid=0x224f waiting on condition
[0x0000000040f16000..0x0000000040f16c90]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.dfs.DFSClient
$LeaseChecker.run(DFSClient.java:597)
        at java.lang.Thread.run(Thread.java:619)

"org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=10
tid=0x000000005c7c5c00 nid=0x224d waiting on condition
[0x0000000040d14000..0x0000000040d14b90]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.ipc.Client
$ConnectionCuller.run(Client.java:423)


"main" prio=10 tid=0x000000005c417000 nid=0x223b waiting for monitor
entry [0x0000000040207000..0x0000000040209ed0]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.hadoop.dfs.DFSClient
$DFSOutputStream.writeChunk(DFSClient.java:2117)
        - waiting to lock <0x00002aaab43ad910> (a java.util.LinkedList)
        at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141)
        at
org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
        at
org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
        - locked <0x00002aaab43addd8> (a org.apache.hadoop.dfs.DFSClient
$DFSOutputStream)
        at org.apache.hadoop.fs.FSDataOutputStream
$PositionCache.write(FSDataOutputStream.java:41)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        - locked <0x00002aaab43aef18> (a
org.apache.hadoop.fs.FSDataOutputStream)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
        at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:157)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:151)
        at
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1028)
        at
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1016)
        at
org.apache.hadoop.fs.FileSystem.moveFromLocalFile(FileSystem.java:1006)
        at
org.apache.hadoop.fs.FileSystem.completeLocalOutput(FileSystem.java:1077)
        ...

Any Help with that? Ask for more information if needed. 

Thanks, and congratulations for your revolutionary project. 

Iván de Prado Alonso
http://ivandeprado.blogspot.com/



Reply via email to