[ 
https://issues.apache.org/jira/browse/HBASE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028817#comment-14028817
 ] 

Jeffrey Zhong commented on HBASE-11094:
---------------------------------------

Thanks [~enis] for the reviews! 

{quote}
Once that happens, all new tasks are created with this new mode.
{quote}
Yes.

{quote}
Do we still need the changes in open Region RPC? Can we use the region in zk 
under replaying nodes be the canonical state?
{quote}
Because the source of truth is in Master and also the recovering state in ZK 
might be stale. Eventually in the future, the recovering state in ZK will be 
removed as well. 

{quote}
Is this relevant?
{quote}
Both test cases fixed in the patch are flaky so just fix them in the patch as 
they're trivial.

{quote}
Wrong log name:
{quote}
Good catch! I'll fix this when I commit the patch.

I'm targeting check in this patch Friday if there is no objections.






> Distributed log replay is incompatible for rolling restarts
> -----------------------------------------------------------
>
>                 Key: HBASE-11094
>                 URL: https://issues.apache.org/jira/browse/HBASE-11094
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Enis Soztutar
>            Assignee: Jeffrey Zhong
>            Priority: Blocker
>             Fix For: 0.99.0
>
>         Attachments: hbase-11094-v2.patch, hbase-11094-v3.patch, 
> hbase-11094-v4.patch, hbase-11094-v5.1.patch, hbase-11094-v5.patch, 
> hbase-11094.patch
>
>
> 0.99.0 comes with dist log replay by default (HBASE-10888). However, reading 
> the code and discussing this with Jeffrey, we realized that the dist log 
> replay code is not compatible with rolling upgrades from 0.98.0 and 1.0.0.
> The issue is that, the region server looks at it own configuration to decide 
> whether the region should be opened in replay mode or not. The open region 
> RPC does not contain that info. So if dist log replay is enabled on master, 
> the master will assign the region and schedule replay tasks. If the region is 
> opened in a RS that does not have this conf enabled, then it will happily 
> open the region in normal mode (not replay mode) causing possible (transient) 
> data loss. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to