[jira] [Created] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

Liu Shaohui (JIRA) Sun, 31 Aug 2014 19:49:26 -0700

Liu Shaohui created HBASE-11868:
-----------------------------------

             Summary: Data loss in hlog when the hdfs is unavailable
                 Key: HBASE-11868
                 URL: https://issues.apache.org/jira/browse/HBASE-11868
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.98.5
            Reporter: Liu Shaohui
            Assignee: Liu Shaohui
            Priority: Blocker



When using the new thread model in hbase, we found a bug which may cause data 
loss when the the hdfs is unavailable.

When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog 
first call appendNoSync to write the edits to hlog and then call sync with 
txid. 

Assumed that the txid of current write is 10, and the syncedTillHere in hlog is 
9 and the failedTxid is 0. When  the the hdfs is unavailable, the AsyncWriter 
or AsyncSyncer will fail to apend the edits or sync, then they will update the 
syncedTillHere to 10 and the failedTxid to 10.

When the hlog calls the sync with txid :10, the failedTxid will nerver be 
checked for txid is less than syncedTillHere.  The client thinks the write 
success , but the data only be writtten to memstore not hlog. If the 
regionserver is down later before the memstore if flushed, the data will be 
lost.
{code}
  // sync all transactions upto the specified txid
  private void syncer(long txid) throws IOException {
    synchronized (this.syncedTillHere) {
      while (this.syncedTillHere.get() < txid) {
        try {
          this.syncedTillHere.wait();

          if (txid <= this.failedTxid.get()) {
            assert asyncIOE != null :
              "current txid is among(under) failed txids, but asyncIOE is 
null!";
            throw asyncIOE;
          }
        } catch (InterruptedException e) {
          LOG.debug("interrupted while waiting for notification from 
AsyncNotifier");
        }
      }
    }
  }
{code}

We can fix this issue by moving the comparing of txid and failedTxid outside 
the while block.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-11868) Data loss in hlog when the hdfs is unavailable

Reply via email to