[ 
https://issues.apache.org/jira/browse/HBASE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051961#comment-15051961
 ] 

stack commented on HBASE-14949:
-------------------------------

SequenceId has a region scope. If you play an edit into a Region twice, it is 
fine.

If you want to skip edits already replayed, you could do something like the 
mechanism we have where master skips all edits that are less than the highest 
sequenceid that has been saved to an hfile (for that region). Regionservers 
report to the master the highest flushed sequenceid per region on their 
heartbeat. It master crashes, it looses this Map and so will replay edits that 
the Region has already seen but no harm done, just resources consumed.

To skip replaying edits already seen, you might keep a running low-water mark 
per region in the Master memory of the last edit sequenceid shipped to a 
region.  If Master crashes, we'd lose this Map and we'd replay edits more than 
once but no harm done just resources consumed.

Looking at patch...

We need RecoveryFileContext? Doesn't Reader have most of this in it?

What about the case where both files have same first edit in it? i.e. we open a 
file try to write ten edits to it ... sequenceid 0, 1, 2...10... and we fail so 
we open new WAL and try to play the same ten.. Replaying, both files will have 
a sequenceid of 0 as first entry?



> Skip duplicate entries when replay WAL.
> ---------------------------------------
>
>                 Key: HBASE-14949
>                 URL: https://issues.apache.org/jira/browse/HBASE-14949
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Heng Chen
>         Attachments: HBASE-14949.patch, HBASE-14949_v1.patch
>
>
> As HBASE-14004 design,  there will be duplicate entries in different WAL.  It 
> happens when one hflush failed, we will close old WAL with 'acked hflushed' 
> length,  then open a new WAL and write the unacked hlushed entries into it.
> So there maybe some overlap between old WAL and new WAL.
> We should skip the duplicate entries when replay.  I think it has no harm to 
> current logic, maybe we do it first. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to