[ 
https://issues.apache.org/jira/browse/HBASE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009261#comment-14009261
 ] 

Jeffrey Zhong commented on HBASE-11094:
---------------------------------------

Thanks for the comments!

{quote}
How does operator know when this has been done?
{quote}
HBASE-10544 will definitely help. In the meantime, an administrator needs to 
wait for all split tasks under znode splitLogZNode clear and then restart 
master & then all region servers. 

{quote}
Im talking about location for RegionServerConfigMismatchException
{quote}
Any suggestion where I should put it?

{quote}
Suggest rename as openForReplay. 
{quote}
Ok. I'll change the name in the next patch.

{quote}
Or, if a crash after the M and the RS have been rolling restarted.  Only one RS 
will be able to open regions.  It could take a while  for the M to figure this 
out going by the below
{quote}
Yes for regions in recovery while for a normal(without any recovery work) 
region move/open will not be affected. Also rolling restart of RSs shouldn't 
take long time.

{quote}
What happens on non-upgraded servers when we pass the code path that this is 
inserted into?
{quote}
That's the reason that blocks rolling upgrade. If both old & upgraded code are 
aware of different recovery mode(including the JIRA patch), we're fine.

{quote}
 What would happen in the above scenarios?
{quote}
The above code make sure SplitLogWorker only grab split log task intended with 
the same recovery mode. 




> Distributed log replay is incompatible for rolling restarts
> -----------------------------------------------------------
>
>                 Key: HBASE-11094
>                 URL: https://issues.apache.org/jira/browse/HBASE-11094
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Jeffrey Zhong
>            Priority: Blocker
>             Fix For: 0.99.0
>
>         Attachments: hbase-11094-v2.patch, hbase-11094.patch
>
>
> 0.99.0 comes with dist log replay by default (HBASE-10888). However, reading 
> the code and discussing this with Jeffrey, we realized that the dist log 
> replay code is not compatible with rolling upgrades from 0.98.0 and 1.0.0.
> The issue is that, the region server looks at it own configuration to decide 
> whether the region should be opened in replay mode or not. The open region 
> RPC does not contain that info. So if dist log replay is enabled on master, 
> the master will assign the region and schedule replay tasks. If the region is 
> opened in a RS that does not have this conf enabled, then it will happily 
> open the region in normal mode (not replay mode) causing possible (transient) 
> data loss. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to