[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)

Himanshu Vashishtha (JIRA) Mon, 14 Oct 2013 17:19:29 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794688#comment-13794688
 ]


Himanshu Vashishtha commented on HBASE-8741:
--------------------------------------------

Thanks for all the reviews, and sorry about the long hiatus on it. I resumed 
the work on it and attaching the patch for the same. It take care of the above 
feedbacks.
To me, it is an important step in easing out our sequence Id story (disentangle 
it from region server HLog).
 
In a nut shell, this is the high level idea of this patch:

1) Add a sequence Id attribute to the HRegion class, It is set to 
openSequenceId when the region is opened. All append operation would use this 
sequenceId at time of appending to WAL.

2) FSHLog: 
i) maintain a map of (region : sequenceId) for each WAL.  (latestSequenceNums). 
This keeps track of latest sequence numbers used by a region when appending to 
the wal.
ii) maintain a map of (Path : <latestSequenceNums>) for each rolled wal file. 
This is used when determining whether a wal is eligible for archiving or not. 

A wal is eligible for archiving when all its region has flushed - past the 
point of their entry in the latestSequenceNums map (as mentioned in i)

When rolling a wal, it checks which of the older wals are eligible for 
archiving.

iii) When we run over the maximum number of allowed Wal files, we check the 
oldest wal file to determine which regions should be flushed so that it could 
be eligible for archiving. 

3) Added test cases in TestHLog to test rolling, archiving and finding 
memstores to flush to archive a wal. 
Refactored some test cases to add sequenceId parameter while appending to HLog.

4) Remove the sequenceId support from HLog (sequenceId and helper methods).

I tested it on jenkins; ran a patched version on a cluster with insert load 
while killing region servers and ensuring data is intact. IT tests in progress. 
Thanks.

> Scope sequenceid to the region rather than regionserver (WAS: Mutations on 
> Regions in recovery mode might have same sequenceIDs)
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8741
>                 URL: https://issues.apache.org/jira/browse/HBASE-8741
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>    Affects Versions: 0.95.1
>            Reporter: Himanshu Vashishtha
>            Assignee: Himanshu Vashishtha
>             Fix For: 0.98.0
>
>         Attachments: HBASE-8741-v0.patch, HBASE-8741-v2.patch, 
> HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, HBASE-8741-v4-again.patch, 
> HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, HBASE-8741-v5.patch
>
>
> Currently, when opening a region, we find the maximum sequence ID from all 
> its HFiles and then set the LogSequenceId of the log (in case the later is at 
> a small value). This works good in recovered.edits case as we are not writing 
> to the region until we have replayed all of its previous edits. 
> With distributed log replay, if we want to enable writes while a region is 
> under recovery, we need to make sure that the logSequenceId > maximum 
> logSequenceId of the old regionserver. Otherwise, we might have a situation 
> where new edits have same (or smaller) sequenceIds. 
> We can store region level information in the WALTrailer, than this scenario 
> could be avoided by:
> a) reading the trailer of the "last completed" file, i.e., last wal file 
> which has a trailer and,
> b) completely reading the last wal file (this file would not have the 
> trailer, so it needs to be read completely).
> In future, if we switch to multi wal file, we could read the trailer for all 
> completed WAL files, and reading the remaining incomplete files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)

Reply via email to