[ 
https://issues.apache.org/jira/browse/HBASE-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13608535#comment-13608535
 ] 

Ted Yu commented on HBASE-7878:
-------------------------------

>From 
>https://builds.apache.org/job/PreCommit-HBASE-Build/4928//testReport/org.apache.hadoop.hbase.regionserver.wal/TestHLogSplit/testSplitWillNotTouchLogsIfNewHLogGetsCreatedAfterSplitStarted/:
{code}
2013-03-21 00:54:27,404 INFO  [ZombieNewLogWriterRegionServer] 
wal.TestHLogSplit$ZombieNewLogWriterRegionServer(1102): Juliet file creator: 
created file /hbase/hlog/hlog.dat..juliet
2013-03-21 00:54:27,406 INFO  [split-log-closeStream-2] 
wal.HLogSplitter$OutputSink$2(1259): Closed path 
/hbase/t1/ccc/recovered.edits/0000000000000000002.temp (wrote 100 edits in 
221ms)
{code}
Meaning the creation of fake HLog preceded outputSink.finishWritingAndClose() 
below:
{code}
        throw new OrphanHLogAfterSplitException(
          "Discovered orphan hlog after split. " + fileSet.iterator().next() + 
" Maybe the "
            + "HRegionServer was not dead when we started");
      }
    } finally {
      status.setStatus("Finishing writing output logs and closing down.");
      splits = outputSink.finishWritingAndClose();
    }
    status.setStatus("Archiving logs after completed split");
    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf);
{code}
i.e. archiveLogs() was not skipped. This was a timing issue in test.
One solution is to pass a Latch into splitLog() and wait for the latch prior to 
the following:
{code}
      FileStatus[] currFiles = fs.listStatus(srcDir);
      if (currFiles.length > processedLogs.size()
{code}
ZombieNewLogWriterRegionServer would count down the Latch once fake HLog is 
written.
                
> recoverFileLease does not check return value of recoverLease
> ------------------------------------------------------------
>
>                 Key: HBASE-7878
>                 URL: https://issues.apache.org/jira/browse/HBASE-7878
>             Project: HBase
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.95.0, 0.94.6
>            Reporter: Eric Newton
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.95.0, 0.98.0, 0.94.7
>
>         Attachments: 7878.94, 7878-94.addendum, 7878-94.addendum2, 
> 7878-trunk.addendum, 7878-trunk.addendum2, 7878-trunk-v10.txt, 
> 7878-trunk-v11-test.txt, 7878-trunk-v12.txt, 7878-trunk-v13.txt, 
> 7878-trunk-v13.txt, 7878-trunk-v2.txt, 7878-trunk-v3.txt, 7878-trunk-v4.txt, 
> 7878-trunk-v5.txt, 7878-trunk-v6.txt, 7878-trunk-v7.txt, 7878-trunk-v8.txt, 
> 7878-trunk-v9.txt, 7878-trunk-v9.txt
>
>
> I think this is a problem, so I'm opening a ticket so an HBase person takes a 
> look.
> Apache Accumulo has moved its write-ahead log to HDFS. I modeled the lease 
> recovery for Accumulo after HBase's lease recovery.  During testing, we 
> experienced data loss.  I found it is necessary to wait until recoverLease 
> returns true to know that the file has been truly closed.  In FSHDFSUtils, 
> the return result of recoverLease is not checked. In the unit tests created 
> to check lease recovery in HBASE-2645, the return result of recoverLease is 
> always checked.
> I think FSHDFSUtils should be modified to check the return result, and wait 
> until it returns true.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to