[
https://issues.apache.org/jira/browse/HBASE-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220284#comment-14220284
]
stack commented on HBASE-11099:
-------------------------------
[~syuanjiang]
bq. stack Just to make sure - your HBASE-11135 fixed the issue#1. Is it correct?
Sorry for the delay. Yes. That was the intent.
[~jeffreyz] Above you and Enis say this a 0.98 issue too? We should apply
there also? Also, in what scenario do you see "...If coprocessor skip some
tail WALEdits"? Is this speculation or something from phoenix or so? Thanks.
> Two situations where we could open a region with smaller sequence number
> ------------------------------------------------------------------------
>
> Key: HBASE-11099
> URL: https://issues.apache.org/jira/browse/HBASE-11099
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 0.99.1
> Reporter: Jeffrey Zhong
> Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 0.99.2
>
> Attachments: HBASE-11099.v1-2.0.patch
>
>
> Recently I happened to run into code where we potentially could open region
> with smaller sequence number:
> 1) Inside function: HRegion#internalFlushcache. This is due to we change the
> way WAL Sync where we use late binding(assign sequence number right before
> wal sync).
> The flushSeqId may less than the change sequence number included in the flush
> which may cause later region opening code to use a smaller than expected
> sequence number when we reopen the region.
> {code}
> flushSeqId = this.sequenceId.incrementAndGet();
> ...
> mvcc.waitForRead(w);
> {code}
> 2) HRegion#replayRecoveredEdits where we have following code:
> {code}
> ...
> if (coprocessorHost != null) {
> status.setStatus("Running pre-WAL-restore hook in coprocessors");
> if (coprocessorHost.preWALRestore(this.getRegionInfo(), key,
> val)) {
> // if bypass this log entry, ignore it ...
> continue;
> }
> }
> ...
> currentEditSeqId = key.getLogSeqNum();
> {code}
> If coprocessor skip some tail WALEdits, then the function will return smaller
> currentEditSeqId. In the end, a region may also open with a smaller sequence
> number. This may cause data loss because Master may record a larger flushed
> sequence Id and some WALEdits maybe skipped during recovery if the region
> fail again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)