[ 
https://issues.apache.org/jira/browse/HBASE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006720#comment-14006720
 ] 

Enis Soztutar commented on HBASE-11094:
---------------------------------------

- This should go inside the  RegionOpenInfo, not OpenRegionRequest. 
OpenRegionRequest is for multiple regions. Different regions can be in diferent 
log replay state. 
{code}
+  // open region for distributedLogReplay
+  optional bool isOpenForDistributedLogReplay = 3;
{code}
- Small typo:
{code}+      // check if current RS has distributedLogReplayh on
{code}
- 
{code}
+        throw new ServiceException(new DoNotRetryIOException("This OpenRegion 
request is opening "
{code}
Once that is thrown, do we retry on a different server? Do we run out of 
retries? 
 - At a higher level, on a rolling restart cluster, even if the master has 
upgraded, 0.98 RS's won't execute the new SLW and RSRpcServices changes in the 
patch. So even though the master will create the split task for replay, one of 
the 0.98 RS's can grab the task and do a split log instead, right? Can also 
happen that some of the log files for the same server are split and some are 
replayed. 

> Distributed log replay is incompatible for rolling restarts
> -----------------------------------------------------------
>
>                 Key: HBASE-11094
>                 URL: https://issues.apache.org/jira/browse/HBASE-11094
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Jeffrey Zhong
>            Priority: Blocker
>             Fix For: 0.99.0
>
>         Attachments: hbase-11094.patch
>
>
> 0.99.0 comes with dist log replay by default (HBASE-10888). However, reading 
> the code and discussing this with Jeffrey, we realized that the dist log 
> replay code is not compatible with rolling upgrades from 0.98.0 and 1.0.0.
> The issue is that, the region server looks at it own configuration to decide 
> whether the region should be opened in replay mode or not. The open region 
> RPC does not contain that info. So if dist log replay is enabled on master, 
> the master will assign the region and schedule replay tasks. If the region is 
> opened in a RS that does not have this conf enabled, then it will happily 
> open the region in normal mode (not replay mode) causing possible (transient) 
> data loss. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to