[ 
https://issues.apache.org/jira/browse/HBASE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013190#comment-14013190
 ] 

Enis Soztutar commented on HBASE-11094:
---------------------------------------

- Agreed with Stack. This should not be in the exceptions package. Also not in 
hbase-client, but hbase-server. 
{code}
+++ 
hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/RegionServerConfigMismatchException.java
{code}
- Instead of SplitLogTask getting conf, can we pass directly the mode to 
constructor. SLT should be like a POJO. Also when the worker creates tasks, 
should it set the mode as well? It seems that it is just setting the mode right 
now to UNKNOWN for everything other than UNASSIGNED. 
- In case of task resubmit, can it be the case that the master has changed 
configuration? Should we get the mode from existing node, and set it to the new 
task? 

- Some offline discussions with Jeffrey: 
 -- It seems that this will be simpler if the region servers do not look into 
their confs, but just use whatever the split log task or region assignment 
tells them.  
 -- If there are split log tasks in master restart, but the master now has a 
different configuration for replay, we can abort the master, or wait for all 
the tasks to drain until we change to the new configuration. 
bq.  How to proceed? Commit this with fat release note and a note in the 1.0 
doc that we need to include in the upgrade doc the steps this issues requires 
rolling upgrading and then discuss on dev in 1.0 thread if we should turn off 
distributed log replay for 1.0?
makes sense.

> Distributed log replay is incompatible for rolling restarts
> -----------------------------------------------------------
>
>                 Key: HBASE-11094
>                 URL: https://issues.apache.org/jira/browse/HBASE-11094
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Jeffrey Zhong
>            Priority: Blocker
>             Fix For: 0.99.0
>
>         Attachments: hbase-11094-v2.patch, hbase-11094-v3.patch, 
> hbase-11094.patch
>
>
> 0.99.0 comes with dist log replay by default (HBASE-10888). However, reading 
> the code and discussing this with Jeffrey, we realized that the dist log 
> replay code is not compatible with rolling upgrades from 0.98.0 and 1.0.0.
> The issue is that, the region server looks at it own configuration to decide 
> whether the region should be opened in replay mode or not. The open region 
> RPC does not contain that info. So if dist log replay is enabled on master, 
> the master will assign the region and schedule replay tasks. If the region is 
> opened in a RS that does not have this conf enabled, then it will happily 
> open the region in normal mode (not replay mode) causing possible (transient) 
> data loss. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to