[
https://issues.apache.org/jira/browse/HBASE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931926#action_12931926
]
Jonathan Gray commented on HBASE-3234:
--------------------------------------
>From a new run, here is all of the activity of the block that causes the
>Abort...
{noformat}
2010-11-14 15:44:00,042 DEBUG [IPC Server handler 4 on 40196]
namenode.FSDirectory(273): DIR* FSDirectory.addFile:
/user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
with blk_4643386764144918409_1009 block is added to the in-memory file system
2010-11-14 15:44:00,042 INFO [IPC Server handler 4 on 40196]
namenode.FSNamesystem(1482): BLOCK* NameSystem.allocateBlock:
/user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005.
blk_4643386764144918409_1009
2010-11-14 15:44:00,044 INFO
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.DataXceiver(228): Receiving block blk_4643386764144918409_1009 src:
/127.0.0.1:48990 dest: /127.0.0.1:48738
2010-11-14 15:44:00,044 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.FSDataset(1517): b=blk_4643386764144918409_1009, f=null
2010-11-14 15:44:00,044 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.FSDataset(1313): writeTo metafile is
/var/users/jgray/hbase/trunk/target/test-data/5173adaa-a370-457d-8293-8f059f70bd31/dfs/data/data1/blocksBeingWritten/blk_4643386764144918409_1009.meta
of size 0
2010-11-14 15:44:00,045 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 waiting for
local datanode to finish write.
2010-11-14 15:44:00,045 DEBUG [DataStreamer for file
/user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$DataStreamer(2429): DataStreamer block
blk_4643386764144918409_1009 wrote packet seqno:0 size:339 offsetInBlock:0
lastPacketInBlock:false
2010-11-14 15:44:00,045 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver(393): Receiving one packet for block
blk_4643386764144918409_1009 of length 318 seqno 0 offsetInBlock 0
lastPacketInBlock false
2010-11-14 15:44:00,046 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_4643386764144918409_1009 responded an ack:
Replies for seqno 0 are SUCCESS
2010-11-14 15:44:00,046 DEBUG [ResponseProcessor for block
blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534): DFSClient Replies for
seqno 0 are SUCCESS
2010-11-14 15:44:00,046 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 waiting for
local datanode to finish write.
2010-11-14 15:44:31,551 DEBUG [DataStreamer for file
/user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$DataStreamer(2429): DataStreamer block
blk_4643386764144918409_1009 wrote packet seqno:-1 size:25 offsetInBlock:0
lastPacketInBlock:false
2010-11-14 15:44:31,551 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver(393): Receiving one packet for block
blk_4643386764144918409_1009 of length 4 seqno -1 offsetInBlock 0
lastPacketInBlock false
2010-11-14 15:44:31,551 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver(636): Changing block file offset of block
blk_4643386764144918409_1009 from 310 to 0 meta file offset to 7
2010-11-14 15:44:31,552 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver(422): Receiving empty packet for block
blk_4643386764144918409_1009
2010-11-14 15:44:31,552 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_4643386764144918409_1009 responded an ack:
Replies for seqno -1 are SUCCESS
2010-11-14 15:44:31,552 DEBUG [ResponseProcessor for block
blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534): DFSClient Replies for
seqno -1 are
2010-11-14 15:44:31,552 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 waiting for
local datanode to finish write.
2010-11-14 15:44:31,867 DEBUG [DataStreamer for file
/user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$DataStreamer(2429): DataStreamer block
blk_4643386764144918409_1009 wrote packet seqno:1 size:804 offsetInBlock:0
lastPacketInBlock:false
2010-11-14 15:44:31,868 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver(393): Receiving one packet for block
blk_4643386764144918409_1009 of length 783 seqno 1 offsetInBlock 0
lastPacketInBlock false
2010-11-14 15:44:31,868 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_4643386764144918409_1009 responded an ack:
Replies for seqno 1 are SUCCESS
2010-11-14 15:44:31,868 DEBUG [ResponseProcessor for block
blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534): DFSClient Replies for
seqno 0 are FAILED
2010-11-14 15:44:31,868 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009 waiting for
local datanode to finish write.
2010-11-14 15:44:31,868 WARN [ResponseProcessor for block
blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2580): DFSOutputStream
ResponseProcessor exception for block
blk_4643386764144918409_1009java.io.IOException: Bad response 1 for block
blk_4643386764144918409_1009 from datanode 127.0.0.1:48738
2010-11-14 15:44:31,868 WARN [DataStreamer for file
/user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$DFSOutputStream(2616):
Error Recovery for block blk_4643386764144918409_1009 bad datanode[0]
127.0.0.1:48738
2010-11-14 15:44:31,869 INFO
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver(565): Exception in receiveBlock for block
blk_4643386764144918409_1009 java.io.IOException: Connection reset by peer
2010-11-14 15:44:31,869 INFO [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(844):
PacketResponder blk_4643386764144918409_1009 0 : Thread is interrupted.
2010-11-14 15:44:31,869 INFO [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(907):
PacketResponder 0 for block blk_4643386764144918409_1009 terminating
2010-11-14 15:44:31,869 INFO
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.DataXceiver(377): writeBlock blk_4643386764144918409_1009 received
exception java.io.IOException: Connection reset by peer
2010-11-14 15:45:32,574 INFO [IPC Server handler 8 on 40196]
namenode.INodeFileUnderConstruction(212): BLOCK* blk_4643386764144918409_1009
recovery started, primary=127.0.0.1:48738
2010-11-14 15:45:34,183 INFO
[org.apache.hadoop.hdfs.server.datanode.datanod...@26c94114]
datanode.DataNode(1632): NameNode calls
recoverBlock(block=blk_4643386764144918409_1009, targets=[127.0.0.1:48738])
2010-11-14 15:45:34,183 DEBUG
[org.apache.hadoop.hdfs.server.datanode.datanod...@26c94114]
datanode.DataNode(1409): block=blk_4643386764144918409_1009
2010-11-14 15:45:34,184 DEBUG
[org.apache.hadoop.hdfs.server.datanode.datanod...@26c94114]
datanode.DataNode(1419): getBlockMetaDataInfo successful
block=blk_4643386764144918409_1009 length 771 genstamp 1009
2010-11-14 15:45:34,185 DEBUG
[org.apache.hadoop.hdfs.server.datanode.datanod...@26c94114]
datanode.DataNode(1563): block=blk_4643386764144918409_1009, (length=771),
syncList=[block:blk_4643386764144918409_1009 node:127.0.0.1:48738],
closeFile=true
2010-11-14 15:45:34,186 INFO
[org.apache.hadoop.hdfs.server.datanode.datanod...@26c94114]
datanode.DataNode(1450): oldblock=blk_4643386764144918409_1009(length=771),
newblock=blk_4643386764144918409_1014(length=771), datanode=127.0.0.1:48738
2010-11-14 15:45:34,188 INFO [IPC Server handler 1 on 40196]
namenode.FSNamesystem(3149): BLOCK* NameSystem.addStoredBlock: blockMap
updated: 127.0.0.1:48738 is added to blk_4643386764144918409_1009 size 771
2010-11-14 15:45:34,189 INFO [IPC Server handler 2 on 40196]
namenode.FSNamesystem(1970):
commitBlockSynchronization(lastblock=blk_4643386764144918409_1009,
newgenerationstamp=1014, newlength=771, newtargets=[127.0.0.1:48738],
closeFile=true, deleteBlock=false)
{noformat}
All of the recovery stuff happens after the abort. The snippet around the
exception/abort from this log is this:
{noformat}
2010-11-14 15:44:31,867 DEBUG [PRI IPC Server handler 3 on 58498]
hdfs.DFSClient$DFSOutputStream(3154): DFSClient writeChunk allocating new
packet seqno=1, src=/user/jgray/.logs/dev692.sf2p.facebook.com
,58498,1289778232413/10.17.83.207%3A58498.1289778233005, packetSize=65557,
chunksPerPacket=127, bytesCurBlock=0
2010-11-14 15:44:31,867 DEBUG
[RegionServer:0;dev692.sf2p.facebook.com,58498,1289778232413.logSyncer]
hdfs.DFSClient$DFSOutputStream(3238): DFSClient flush() : saveOffset 512
bytesCurBlock 771 lastFlush
Offset 310
2010-11-14 15:44:31,867 DEBUG [DataStreamer for file
/user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$
DFSOutputStream$DataStreamer(2429): DataStreamer block
blk_4643386764144918409_1009 wrote packet seqno:1 size:804 offsetInBlock:0
lastPacketInBlock:false
2010-11-14 15:44:31,868 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver(393): Receiving one packet for block
blk_4643386764144918409_1009 of length 783 seqno 1
offsetInBlock 0 lastPacketInBlock false
2010-11-14 15:44:31,868 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver$PacketResponder(737): PacketResponder 0 adding seqno 1
to ack queue.
2010-11-14 15:44:31,868 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_4643386764144918409_1009 responded a
n ack: Replies for seqno 1 are SUCCESS
2010-11-14 15:44:31,868 DEBUG [ResponseProcessor for block
blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534): DFSClient Replies for
seqno 0 are FAILED
2010-11-14 15:44:31,868 DEBUG [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block blk_4643386764144918409_1009
waiting for local datanode to finish write.
2010-11-14 15:44:31,868 WARN [ResponseProcessor for block
blk_4643386764144918409_1009]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2580): DFSOutputStream
ResponseProcessor exception for block bl
k_4643386764144918409_1009java.io.IOException: Bad response 1 for block
blk_4643386764144918409_1009 from datanode 127.0.0.1:48738
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2542)
2010-11-14 15:44:31,868 WARN [DataStreamer for file
/user/jgray/.logs/dev692.sf2p.facebook.com,58498,1289778232413/10.17.83.207%3A58498.1289778233005
block blk_4643386764144918409_1009] hdfs.DFSClient$
DFSOutputStream(2616): Error Recovery for block blk_4643386764144918409_1009
bad datanode[0] 127.0.0.1:48738
2010-11-14 15:44:31,869 INFO
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.BlockReceiver(565): Exception in receiveBlock for block
blk_4643386764144918409_1009 java.io.IOExcept
ion: Connection reset by peer
2010-11-14 15:44:31,869 INFO [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(844):
PacketResponder blk_4643386764144918409_1009 0 : Thread is interrupt
ed.
2010-11-14 15:44:31,869 INFO [PacketResponder 0 for Block
blk_4643386764144918409_1009] datanode.BlockReceiver$PacketResponder(907):
PacketResponder 0 for block blk_4643386764144918409_1009 terminating
2010-11-14 15:44:31,869 INFO
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@6a25b72a]
datanode.DataXceiver(377): writeBlock blk_4643386764144918409_1009 received
exception java.io.IOException: Con
nection reset by peer
2010-11-14 15:44:31,869 WARN
[RegionServer:0;dev692.sf2p.facebook.com,58498,1289778232413.logSyncer]
hdfs.DFSClient$DFSOutputStream(3293): Error while syncing
java.io.IOException: All datanodes 127.0.0.1:48738 are bad. Aborting...
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2666)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2157)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2356)
{noformat}
> hdfs-724 "breaks" TestHBaseTestingUtility multiClusters
> -------------------------------------------------------
>
> Key: HBASE-3234
> URL: https://issues.apache.org/jira/browse/HBASE-3234
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Critical
> Fix For: 0.90.0
>
> Attachments:
> org.apache.hadoop.hbase.TestHBaseTestingUtility-output.txt,
> org.apache.hadoop.hbase.TestHBaseTestingUtility.txt
>
>
> We upgraded our hadoop jar in TRUNK to latest on 0.20-append branch.
> TestHBaseTestingUtility started failing reliably. If I back out hdfs-724,
> the test passes again. This issue is about figuring whats up here.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.