[ 
https://issues.apache.org/jira/browse/HBASE-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587536#comment-13587536
 ] 

Ted Yu commented on HBASE-7878:
-------------------------------

By directly injecting IOE at the location of calling recoverLease, I was able 
to trigger append() call.
Here is snippet from test output:
{code}
2013-02-26 12:50:28,110 DEBUG [IPC Server handler 2 on 55601] 
namenode.FSEditLog$EditLogFileOutputStream(268): Preallocated 1048576 bytes at 
the end of the edit log (offset 4)
2013-02-26 12:50:28,113 INFO  [IPC Server handler 2 on 55601] 
namenode.FSNamesystem(2458): 
commitBlockSynchronization(newblock=blk_-4984910605561849777_1003, 
file=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362, 
newgenerationstamp=1003, newlength=2634, newtargets=[127.0.0.1:55695, 
127.0.0.1:55698, 127.0.0.1:55701]) successful
2013-02-26 12:50:28,155 INFO  [Thread-227] util.FSHDFSUtils(70): Recovering 
file hdfs://localhost:55601/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362
2013-02-26 12:50:28,155 DEBUG [Thread-227] util.FSHDFSUtils(90): Failed 
fs.recoverLease invocation, java.io.IOException, trying fs.append instead
2013-02-26 12:50:28,158 INFO  [IPC Server handler 7 on 55601] 
namenode.FSNamesystem(169): ugi=tyu ip=/127.0.0.1 cmd=append  
src=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362 dst=null  perm=null
2013-02-26 12:50:28,159 DEBUG [Thread-227] 
hdfs.DFSClient$DFSOutputStream(3516): computePacketChunkSize: 
src=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362, chunkSize=442, 
chunksPerPacket=1, packetSize=467
2013-02-26 12:50:28,159 DEBUG [Thread-227] hdfs.DFSClient(189): Connecting to 
127.0.0.1:55697
2013-02-26 12:50:28,162 INFO  [IPC Server handler 0 on 55697] 
datanode.DataNode(2130): Client calls 
recoverBlock(block=blk_-4984910605561849777_1003, targets=[127.0.0.1:55695, 
127.0.0.1:55698, 127.0.0.1:55701])
2013-02-26 12:50:28,163 DEBUG [IPC Server handler 0 on 55697] 
datanode.FSDataset(2143): Interrupting active writer threads for block 
blk_-4984910605561849777_1003
2013-02-26 12:50:28,163 DEBUG [IPC Server handler 0 on 55697] 
datanode.FSDataset(2159): getBlockMetaDataInfo successful 
block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
2013-02-26 12:50:28,165 DEBUG [IPC Server handler 1 on 55700] 
datanode.FSDataset(2143): Interrupting active writer threads for block 
blk_-4984910605561849777_1003
2013-02-26 12:50:28,165 DEBUG [IPC Server handler 1 on 55700] 
datanode.FSDataset(2159): getBlockMetaDataInfo successful 
block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
2013-02-26 12:50:28,166 DEBUG [IPC Server handler 1 on 55703] 
datanode.FSDataset(2143): Interrupting active writer threads for block 
blk_-4984910605561849777_1003
2013-02-26 12:50:28,166 DEBUG [IPC Server handler 1 on 55703] 
datanode.FSDataset(2159): getBlockMetaDataInfo successful 
block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
{code}
TestHLog#testAppendClose passed using hadoop 1.0.

It is not clear to me at the moment how the IOE can be injected in the test 
without such hack.
                
> recoverFileLease does not check return value of recoverLease
> ------------------------------------------------------------
>
>                 Key: HBASE-7878
>                 URL: https://issues.apache.org/jira/browse/HBASE-7878
>             Project: HBase
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.95.0, 0.94.6
>            Reporter: Eric Newton
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.95.0, 0.94.6
>
>         Attachments: 7878-trunk-v2.txt, 7878-trunk-v3.txt, 7878-trunk-v4.txt
>
>
> I think this is a problem, so I'm opening a ticket so an HBase person takes a 
> look.
> Apache Accumulo has moved its write-ahead log to HDFS. I modeled the lease 
> recovery for Accumulo after HBase's lease recovery.  During testing, we 
> experienced data loss.  I found it is necessary to wait until recoverLease 
> returns true to know that the file has been truly closed.  In FSHDFSUtils, 
> the return result of recoverLease is not checked. In the unit tests created 
> to check lease recovery in HBASE-2645, the return result of recoverLease is 
> always checked.
> I think FSHDFSUtils should be modified to check the return result, and wait 
> until it returns true.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to