[
https://issues.apache.org/jira/browse/HBASE-11094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007902#comment-14007902
]
Jeffrey Zhong commented on HBASE-11094:
---------------------------------------
Thanks [~enis] for the comments!
{quote}
the patch attached contains changes to PB generated classes that are not
touched by the patch (mapreduce.proto
{quote}
That's the leftover from others. Even we run protobuf compile against trunk
branch without my changes, the mapreduce changes still shows up.
{quote}
This should go inside the RegionOpenInfo
{quote}
Fixed in v2.
{quote}
Once that is thrown, do we retry on a different server? Do we run out of retries
{quote}
It will retry though I still created a new exception in v2 so that it's more
explicit.
{quote}
0.98 RS's won't execute the new SLW and RSRpcServices changes in the patch
{quote}
That's right. The old code doesn't have the change so we cannot do a rolling
upgrade(have to stop & restart everything). There are some options discussed
with Enis offline but not clean and one time effort. It seems we still need to
turn it off by default for 1.0 to support rolling upgrade.
> Distributed log replay is incompatible for rolling restarts
> -----------------------------------------------------------
>
> Key: HBASE-11094
> URL: https://issues.apache.org/jira/browse/HBASE-11094
> Project: HBase
> Issue Type: Sub-task
> Reporter: Enis Soztutar
> Assignee: Jeffrey Zhong
> Priority: Blocker
> Fix For: 0.99.0
>
> Attachments: hbase-11094-v2.patch, hbase-11094.patch
>
>
> 0.99.0 comes with dist log replay by default (HBASE-10888). However, reading
> the code and discussing this with Jeffrey, we realized that the dist log
> replay code is not compatible with rolling upgrades from 0.98.0 and 1.0.0.
> The issue is that, the region server looks at it own configuration to decide
> whether the region should be opened in replay mode or not. The open region
> RPC does not contain that info. So if dist log replay is enabled on master,
> the master will assign the region and schedule replay tasks. If the region is
> opened in a RS that does not have this conf enabled, then it will happily
> open the region in normal mode (not replay mode) causing possible (transient)
> data loss.
--
This message was sent by Atlassian JIRA
(v6.2#6252)