Jeffrey Zhong created HBASE-11099:
-------------------------------------

             Summary: Two situations where we could open a region with smaller 
sequence number
                 Key: HBASE-11099
                 URL: https://issues.apache.org/jira/browse/HBASE-11099
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.99.0
            Reporter: Jeffrey Zhong


Recently I happened to run into code where we potentially could open region 
with smaller sequence number:

1) Inside function: HRegion#internalFlushcache. This is due to we change the 
way WAL Sync where we use late binding(assign sequence number right before wal 
sync).
The flushSeqId may less than the change sequence number included in the flush 
which may cause later region opening code to use a smaller than expected 
sequence number when we reopen the region.
{code}
flushSeqId = this.sequenceId.incrementAndGet();
...
mvcc.waitForRead(w);
{code}

2) HRegion#replayRecoveredEdits where we have following code:
{code}
...
          if (coprocessorHost != null) {
            status.setStatus("Running pre-WAL-restore hook in coprocessors");
            if (coprocessorHost.preWALRestore(this.getRegionInfo(), key, val)) {
              // if bypass this log entry, ignore it ...
              continue;
            }
          }
...
          currentEditSeqId = key.getLogSeqNum();
{code} 
If coprocessor skip some tail WALEdits, then the function will return smaller 
currentEditSeqId. In the end, a region may also open with a smaller sequence 
number. This may cause data loss because Master may record a larger flushed 
sequence Id and some WALEdits maybe skipped during recovery if the region fail 
again.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to