[ 
https://issues.apache.org/jira/browse/HBASE-11868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116995#comment-14116995
 ] 

Hadoop QA commented on HBASE-11868:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12665695/HBASE-11868-0.98-v1.diff
  against trunk revision .
  ATTACHMENT ID: 12665695

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10662//console

This message is automatically generated.

> Data loss in hlog when the hdfs is unavailable
> ----------------------------------------------
>
>                 Key: HBASE-11868
>                 URL: https://issues.apache.org/jira/browse/HBASE-11868
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.5
>            Reporter: Liu Shaohui
>            Assignee: Liu Shaohui
>            Priority: Blocker
>         Attachments: HBASE-11868-0.98-v1.diff
>
>
> When using the new thread model in hbase 0.98, we found a bug which may cause 
> data loss when the the hdfs is unavailable.
> When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
> first call appendNoSync to write the edits to hlog and then call sync with 
> txid. 
> Assumed that the txid of current write is 10, and the syncedTillHere in hlog 
> is 9 and the failedTxid is 0. When  the the hdfs is unavailable, the 
> AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they 
> will update the syncedTillHere to 10 and the failedTxid to 10.
> When the hlog calls the sync with txid :10, the failedTxid will nerver be 
> checked for txid is less than syncedTillHere.  The client thinks the write 
> success , but the data only be writtten to memstore not hlog. If the 
> regionserver is down later before the memstore if flushed, the data will be 
> lost.
> See: FSHLog.java #1348
> {code}
>   // sync all transactions upto the specified txid
>   private void syncer(long txid) throws IOException {
>     synchronized (this.syncedTillHere) {
>       while (this.syncedTillHere.get() < txid) {
>         try {
>           this.syncedTillHere.wait();
>           if (txid <= this.failedTxid.get()) {
>             assert asyncIOE != null :
>               "current txid is among(under) failed txids, but asyncIOE is 
> null!";
>             throw asyncIOE;
>           }
>         } catch (InterruptedException e) {
>           LOG.debug("interrupted while waiting for notification from 
> AsyncNotifier");
>         }
>       }
>     }
>   }
> {code}
> We can fix this issue by moving the comparing of txid and failedTxid outside 
> the while block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to