[
https://issues.apache.org/jira/browse/HBASE-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587536#comment-13587536
]
Ted Yu commented on HBASE-7878:
-------------------------------
By directly injecting IOE at the location of calling recoverLease, I was able
to trigger append() call.
Here is snippet from test output:
{code}
2013-02-26 12:50:28,110 DEBUG [IPC Server handler 2 on 55601]
namenode.FSEditLog$EditLogFileOutputStream(268): Preallocated 1048576 bytes at
the end of the edit log (offset 4)
2013-02-26 12:50:28,113 INFO [IPC Server handler 2 on 55601]
namenode.FSNamesystem(2458):
commitBlockSynchronization(newblock=blk_-4984910605561849777_1003,
file=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362,
newgenerationstamp=1003, newlength=2634, newtargets=[127.0.0.1:55695,
127.0.0.1:55698, 127.0.0.1:55701]) successful
2013-02-26 12:50:28,155 INFO [Thread-227] util.FSHDFSUtils(70): Recovering
file hdfs://localhost:55601/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362
2013-02-26 12:50:28,155 DEBUG [Thread-227] util.FSHDFSUtils(90): Failed
fs.recoverLease invocation, java.io.IOException, trying fs.append instead
2013-02-26 12:50:28,158 INFO [IPC Server handler 7 on 55601]
namenode.FSNamesystem(169): ugi=tyu ip=/127.0.0.1 cmd=append
src=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362 dst=null perm=null
2013-02-26 12:50:28,159 DEBUG [Thread-227]
hdfs.DFSClient$DFSOutputStream(3516): computePacketChunkSize:
src=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362, chunkSize=442,
chunksPerPacket=1, packetSize=467
2013-02-26 12:50:28,159 DEBUG [Thread-227] hdfs.DFSClient(189): Connecting to
127.0.0.1:55697
2013-02-26 12:50:28,162 INFO [IPC Server handler 0 on 55697]
datanode.DataNode(2130): Client calls
recoverBlock(block=blk_-4984910605561849777_1003, targets=[127.0.0.1:55695,
127.0.0.1:55698, 127.0.0.1:55701])
2013-02-26 12:50:28,163 DEBUG [IPC Server handler 0 on 55697]
datanode.FSDataset(2143): Interrupting active writer threads for block
blk_-4984910605561849777_1003
2013-02-26 12:50:28,163 DEBUG [IPC Server handler 0 on 55697]
datanode.FSDataset(2159): getBlockMetaDataInfo successful
block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
2013-02-26 12:50:28,165 DEBUG [IPC Server handler 1 on 55700]
datanode.FSDataset(2143): Interrupting active writer threads for block
blk_-4984910605561849777_1003
2013-02-26 12:50:28,165 DEBUG [IPC Server handler 1 on 55700]
datanode.FSDataset(2159): getBlockMetaDataInfo successful
block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
2013-02-26 12:50:28,166 DEBUG [IPC Server handler 1 on 55703]
datanode.FSDataset(2143): Interrupting active writer threads for block
blk_-4984910605561849777_1003
2013-02-26 12:50:28,166 DEBUG [IPC Server handler 1 on 55703]
datanode.FSDataset(2159): getBlockMetaDataInfo successful
block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
{code}
TestHLog#testAppendClose passed using hadoop 1.0.
It is not clear to me at the moment how the IOE can be injected in the test
without such hack.
> recoverFileLease does not check return value of recoverLease
> ------------------------------------------------------------
>
> Key: HBASE-7878
> URL: https://issues.apache.org/jira/browse/HBASE-7878
> Project: HBase
> Issue Type: Bug
> Components: util
> Affects Versions: 0.95.0, 0.94.6
> Reporter: Eric Newton
> Assignee: Ted Yu
> Priority: Critical
> Fix For: 0.95.0, 0.94.6
>
> Attachments: 7878-trunk-v2.txt, 7878-trunk-v3.txt, 7878-trunk-v4.txt
>
>
> I think this is a problem, so I'm opening a ticket so an HBase person takes a
> look.
> Apache Accumulo has moved its write-ahead log to HDFS. I modeled the lease
> recovery for Accumulo after HBase's lease recovery. During testing, we
> experienced data loss. I found it is necessary to wait until recoverLease
> returns true to know that the file has been truly closed. In FSHDFSUtils,
> the return result of recoverLease is not checked. In the unit tests created
> to check lease recovery in HBASE-2645, the return result of recoverLease is
> always checked.
> I think FSHDFSUtils should be modified to check the return result, and wait
> until it returns true.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira