[
https://issues.apache.org/jira/browse/HBASE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009261#comment-14009261
]
Jeffrey Zhong commented on HBASE-11094:
---------------------------------------
Thanks for the comments!
{quote}
How does operator know when this has been done?
{quote}
HBASE-10544 will definitely help. In the meantime, an administrator needs to
wait for all split tasks under znode splitLogZNode clear and then restart
master & then all region servers.
{quote}
Im talking about location for RegionServerConfigMismatchException
{quote}
Any suggestion where I should put it?
{quote}
Suggest rename as openForReplay.
{quote}
Ok. I'll change the name in the next patch.
{quote}
Or, if a crash after the M and the RS have been rolling restarted. Only one RS
will be able to open regions. It could take a while for the M to figure this
out going by the below
{quote}
Yes for regions in recovery while for a normal(without any recovery work)
region move/open will not be affected. Also rolling restart of RSs shouldn't
take long time.
{quote}
What happens on non-upgraded servers when we pass the code path that this is
inserted into?
{quote}
That's the reason that blocks rolling upgrade. If both old & upgraded code are
aware of different recovery mode(including the JIRA patch), we're fine.
{quote}
What would happen in the above scenarios?
{quote}
The above code make sure SplitLogWorker only grab split log task intended with
the same recovery mode.
> Distributed log replay is incompatible for rolling restarts
> -----------------------------------------------------------
>
> Key: HBASE-11094
> URL: https://issues.apache.org/jira/browse/HBASE-11094
> Project: HBase
> Issue Type: Sub-task
> Reporter: Enis Soztutar
> Assignee: Jeffrey Zhong
> Priority: Blocker
> Fix For: 0.99.0
>
> Attachments: hbase-11094-v2.patch, hbase-11094.patch
>
>
> 0.99.0 comes with dist log replay by default (HBASE-10888). However, reading
> the code and discussing this with Jeffrey, we realized that the dist log
> replay code is not compatible with rolling upgrades from 0.98.0 and 1.0.0.
> The issue is that, the region server looks at it own configuration to decide
> whether the region should be opened in replay mode or not. The open region
> RPC does not contain that info. So if dist log replay is enabled on master,
> the master will assign the region and schedule replay tasks. If the region is
> opened in a RS that does not have this conf enabled, then it will happily
> open the region in normal mode (not replay mode) causing possible (transient)
> data loss.
--
This message was sent by Atlassian JIRA
(v6.2#6252)