[
https://issues.apache.org/jira/browse/HBASE-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931960#action_12931960
]
stack commented on HBASE-3234:
------------------------------
Good on you Todd.
Ignore the user issue mentioned above.
Lining up a good log and a bad log, here is a good log:
{code}
8173 2010-11-15 01:42:29,781 DEBUG [PRI IPC Server handler 1 on 51098]
hdfs.DFSClient$DFSOutputStream(3099): DFSClient writeChunk allocating new
packet seqno=1, src=/user/stack/.lo
gs/pynchon-432.lan,51098,1289785308076/192.168.1.69%3A51098.1289785308941,
packetSize=65557, chunksPerPacket=127, bytesCurBlock=0
8174 2010-11-15 01:42:29,782 DEBUG
[RegionServer:0;pynchon-432.lan,51098,1289785308076.logSyncer]
hdfs.DFSClient$DFSOutputStream(3170): DFSClient flush() : saveOffset 512
bytesCurB lock 762 lastFlushOffset 301
8175 2010-11-15 01:42:29,782 DEBUG [DataStreamer for file
/user/stack/.logs/pynchon-432.lan,51098,1289785308076/192.168.1.69%3A51098.1289785308941
block blk_8612924244108136372_100 7]
hdfs.DFSClient$DFSOutputStream$DataStreamer(2376): DataStreamer block
blk_8612924244108136372_1007 wrote packet seqno:1 size:795 offsetInBlock:0
lastPacketInBlock:false
8176 2010-11-15 01:42:29,782 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@1a83e35b]
datanode.BlockReceiver(393): Receiving one packet for block
blk_8612924244108136372 _1007 of length 774 seqno 1 offsetInBlock 0
lastPacketInBlock false
8177 2010-11-15 01:42:29,782 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@1a83e35b]
datanode.BlockReceiver(635): Changing block file offset of block
blk_86129242441081 36372_1007 from 301 to 0 meta file offset to 7
8178 2010-11-15 01:42:29,783 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@1a83e35b]
datanode.BlockReceiver$PacketResponder(736): PacketResponder 0 adding seqno 1
to ac k queue.
8179 2010-11-15 01:42:29,783 DEBUG [PacketResponder 0 for Block
blk_8612924244108136372_1007] datanode.BlockReceiver$PacketResponder(806):
PacketResponder 0 for block blk_861292424 4108136372_1007 acking for
packet 1
8180 2010-11-15 01:42:29,783 DEBUG [ResponseProcessor for block
blk_8612924244108136372_1007]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2481): DFSClient Replies for
seqno 1 are SUCCESS
8181 2010-11-15 01:42:29,783 DEBUG [PRI IPC Server handler 1 on 51098]
ipc.HBaseRPC$Server(574): Served: put queueTime= 0 procesingTime= 3
8182 2010-11-15 01:42:29,783 DEBUG [PRI IPC Server handler 1 on 51098]
ipc.HBaseServer$Responder(718): IPC Server Responder: responding to #157 from
192.168.1.69:51194
8183 2010-11-15 01:42:29,783 DEBUG [PRI IPC Server handler 1 on 51098]
ipc.HBaseServer$Responder(737): IPC Server Responder: responding to #157 from
192.168.1.69:51194 Wrote 8 byte s.
8184 2010-11-15 01:42:29,783 DEBUG [IPC Client (47) connection to
pynchon-432.lan/192.168.1.69:51098 from an unknown user]
ipc.HBaseClient$Connection(524): IPC Client (47) connecti on to
pynchon-432.lan/192.168.1.69:51098 from an unknown user got value #157
8185 2010-11-15 01:42:29,784 DEBUG [IPC Server handler 4 on 51095]
ipc.HBaseRPC$Invoker(261): Call: put 4
8186 2010-11-15 01:42:29,784 INFO [IPC Server handler 4 on 51095]
catalog.MetaEditor(59): Added region
test,,1289785349688.03c89b2db74c8c3215c5fd46429fa2f0. to META
{code}
Here is bad log
{code}
8118 2010-11-14 23:38:57,616 DEBUG [PRI IPC Server handler 8 on 64748]
hdfs.DFSClient$DFSOutputStream(3154): DFSClient writeChunk allocating new
packet seqno=1, src=/user/stack/.lo
gs/172.16.198.219,64748,1289777896452/172.16.198.219%3A64748.1289777897337,
packetSize=65557, chunksPerPacket=127, bytesCurBlock=0
8119 2010-11-14 23:38:57,617 DEBUG
[RegionServer:0;172.16.198.219,64748,1289777896452.logSyncer]
hdfs.DFSClient$DFSOutputStream(3238): DFSClient flush() : saveOffset 512
bytesCurBl ock 761 lastFlushOffset 300
8120 2010-11-14 23:38:57,617 DEBUG [DataStreamer for file
/user/stack/.logs/172.16.198.219,64748,1289777896452/172.16.198.219%3A64748.1289777897337
block blk_-8256147503844598069_1 007]
hdfs.DFSClient$DFSOutputStream$DataStreamer(2429): DataStreamer block
blk_-8256147503844598069_1007 wrote packet seqno:1 size:794 offsetInBlock:0
lastPacketInBlock:false
8121 2010-11-14 23:38:57,617 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@753bc30a]
datanode.BlockReceiver(393): Receiving one packet for block
blk_-825614750384459806 9_1007 of length 773 seqno 1 offsetInBlock 0
lastPacketInBlock false
8122 2010-11-14 23:38:57,617 DEBUG
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@753bc30a]
datanode.BlockReceiver$PacketResponder(737): PacketResponder 0 adding seqno 1
to ac k queue.
8123 2010-11-14 23:38:57,618 DEBUG [PacketResponder 0 for Block
blk_-8256147503844598069_1007] datanode.BlockReceiver$PacketResponder(891):
PacketResponder 0 for block blk_-8256147 503844598069_1007 responded an
ack: Replies for seqno 1 are SUCCESS
8124 2010-11-14 23:38:57,618 DEBUG [ResponseProcessor for block
blk_-8256147503844598069_1007]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2534): DFSClient Replies for
seqno 0 are FAILED
8125 2010-11-14 23:38:57,618 DEBUG [PacketResponder 0 for Block
blk_-8256147503844598069_1007] datanode.BlockReceiver$PacketResponder(789):
PacketResponder 0 seqno = -2 for block b lk_-8256147503844598069_1007
waiting for local datanode to finish write.
8126 2010-11-14 23:38:57,618 WARN [ResponseProcessor for block
blk_-8256147503844598069_1007]
hdfs.DFSClient$DFSOutputStream$ResponseProcessor(2580): DFSOutputStream
ResponseProce ssor exception for block
blk_-8256147503844598069_1007java.io.IOException: Bad response 1 for block
blk_-8256147503844598069_1007 from datanode 127.0.0.1:64740
8127 at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2542)
8128
8129 2010-11-14 23:38:57,619 WARN [DataStreamer for file
/user/stack/.logs/172.16.198.219,64748,1289777896452/172.16.198.219%3A64748.1289777897337
block blk_-8256147503844598069_1 007]
hdfs.DFSClient$DFSOutputStream(2616): Error Recovery for block
blk_-8256147503844598069_1007 bad datanode[0] 127.0.0.1:64740
8130 2010-11-14 23:38:57,619 INFO
[org.apache.hadoop.hdfs.server.datanode.dataxcei...@753bc30a]
datanode.BlockReceiver(565): Exception in receiveBlock for block
blk_-8256147503844 598069_1007 java.io.EOFException: while trying to read
794 bytes
{code}
In the bad, its writing 794 bytes but in good its writing 795 bytes.
The good does a ' Changing block file offset of block
blk_8612924244108136372_1007 from 301 to 0 meta file offset to 7' before it
does PacketResponder 0 adding seqno 1 to ack queue.' and then getting an ack
to report SUCCESS. The 'bad' doesn't do this its size seem to be smaller by
one byte and after acking seqno 1, it then goes on to look for ack on seqno 0
and reports FAILED.
I don't know what this means (smile). Seems significant.
> hdfs-724 "breaks" TestHBaseTestingUtility multiClusters
> -------------------------------------------------------
>
> Key: HBASE-3234
> URL: https://issues.apache.org/jira/browse/HBASE-3234
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Critical
> Fix For: 0.90.0
>
> Attachments:
> org.apache.hadoop.hbase.TestHBaseTestingUtility-output.txt,
> org.apache.hadoop.hbase.TestHBaseTestingUtility-output.txt,
> org.apache.hadoop.hbase.TestHBaseTestingUtility.txt
>
>
> We upgraded our hadoop jar in TRUNK to latest on 0.20-append branch.
> TestHBaseTestingUtility started failing reliably. If I back out hdfs-724,
> the test passes again. This issue is about figuring whats up here.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.