[ 
https://issues.apache.org/jira/browse/HBASE-22414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868408#comment-16868408
 ] 

Wellington Chevreuil commented on HBASE-22414:
----------------------------------------------

Hi [~Xiaolin Ha], thanks for submitting the PR. Had put some comments there. As 
for latest concerns raised by [~xucang]:
 {quote}And what's more, after the failed call, if users retry to move the 
servers or tables to the target group to recover regions on wrong RSs, the 
method will reject to retry, because meta and zk has been updated before move 
regions in the last call.So skipping failed regions helps make as more as 
possible regions on correct RS.{quote}
So, it seems that currently we don't handle well these failures, as we leave 
the region on an inconsistent state.

{quote}
My codes catch the Ex in one movement, and will retry to move failed regions 
afterwards. I think there should be some changes in my codes, when retry time 
is exhausted, the Exception of failed regions will still be threw, and front 
layer can handle it. What do you think about it?
{quote}
 Yep, maybe mark those who failed in order to cleanup the state in meta and ZK, 
before throwing exception upwards?

> Interruption of moving regions in RSGroup will cause regions on wrong rs
> ------------------------------------------------------------------------
>
>                 Key: HBASE-22414
>                 URL: https://issues.apache.org/jira/browse/HBASE-22414
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>    Affects Versions: 2.2.0
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>         Attachments: HBASE-22414.master.001.patch
>
>
> We bulk moving regions to target RSGroup, and each movement of region will 
> submit a TRSP, but one TRSP encounters exception will make the whole movement 
> action terminate. Later regions will  not be moved to correct servers unless 
> reassign.
> I think we can skip failed moved regions, and retry to move after all has 
> been traversed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to