[
https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688294#comment-13688294
]
Himanshu Vashishtha commented on HBASE-8741:
--------------------------------------------
Thanks for sharing the approach Sergey.
One con about the above approach is a possibility to block up the writes, and
that too, in a write heavy app.
Re: bumping up sequenceID scheme: Stack raised a valid concern about opening a
region in the last WAL file and it might have exceptionally high sequence
number. Let's say such a region is 'Rg'.
I was thinking of the following approach:
Based on the WAL file size, we can determine the maximum number of Edits a WAL
file can have. Let's say it is X.
a) There is a znode per rs: /hbase/sequenceId/rs1[]. It is updated whenever a
region is opened AND we find that we need to bump up the log sequenceId because
the region has larger sequenceId in its HFiles than the current regionserver
Log sequenceId. Let's say it reads value 'SqN2'.
Now, when processing a regionserver failover:
a) Read the trailer of the last completed WAL file to know the sequenceId at
the time the last log was rolled. Let's say the sequenceId is 'SqN1'.
b) While opening the regions of the failed rs in SSH, we read both 'SqN1', and
'SqN2'. If 'SqN1' > 'SqN2', then we are sure that no region 'Rg' was opened in
the last WAL. Otherwise, we use SqN2 in step c.
c) We would hint the new regionserver while opening regions of the dead
regionserver (these regions will carry this info in HRegionInfo) to use
sequenceNumber = SqN + nX, where 'n' is the number of incompleted WAL files
(which don't have trailers). In current case, it is 1. If we have multiWAL, we
would use number of WALs we support.
Pros:
1) No blocking of writes. We are adding logic/processing only in recovery path.
2) Not reading WALs multiple times
3) Multi WAL could be supported.
Cons:
1) Extra zk call. But this will be called _only_ when we are bumping the
sequenceID of the RegionServer.
2) One znode per rs. This would be clean up when master is done processing of
the dead regionserver.
I think with this, we could allay our concerns about sequenceId collision in
the new regionserver, and regions can be mark available for writes without
waiting for distributed log replay/splitting to finish.
Please let me know what you think of this. Thanks.
> Mutations on Regions in recovery mode might have same sequenceIDs
> -----------------------------------------------------------------
>
> Key: HBASE-8741
> URL: https://issues.apache.org/jira/browse/HBASE-8741
> Project: HBase
> Issue Type: Bug
> Components: MTTR
> Affects Versions: 0.95.1
> Reporter: Himanshu Vashishtha
> Assignee: Himanshu Vashishtha
>
> Currently, when opening a region, we find the maximum sequence ID from all
> its HFiles and then set the LogSequenceId of the log (in case the later is at
> a small value). This works good in recovered.edits case as we are not writing
> to the region until we have replayed all of its previous edits.
> With distributed log replay, if we want to enable writes while a region is
> under recovery, we need to make sure that the logSequenceId > maximum
> logSequenceId of the old regionserver. Otherwise, we might have a situation
> where new edits have same (or smaller) sequenceIds.
> We can store region level information in the WALTrailer, than this scenario
> could be avoided by:
> a) reading the trailer of the "last completed" file, i.e., last wal file
> which has a trailer and,
> b) completely reading the last wal file (this file would not have the
> trailer, so it needs to be read completely).
> In future, if we switch to multi wal file, we could read the trailer for all
> completed WAL files, and reading the remaining incomplete files.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira