[ 
https://issues.apache.org/jira/browse/HBASE-22414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868220#comment-16868220
 ] 

Xiaolin Ha commented on HBASE-22414:
------------------------------------

[~xucang] Thanks for your review.
{quote}why do you think skipping failed region movings is OK?
{quote}
I think in the progress of moving regions to RSGroup, when one region move 
failed, the other regions need to be moved shouldn't be affect. Old codes just 
throw the Exception and abort the whole moving progress, which will lead the 
remaining regions on wrong RSs, as mentioned in the summary. If users call 
these methods and move servers or tables failed, it means some regions will be 
on the target group servers, other regions will be on the source group/wrong 
servers and still serving. And what's more, after the failed call, if users 
retry to move the servers or tables to the target group to recover regions on 
wrong RSs, the method will reject to retry, because meta and zk has been 
updated before move regions in the last call.So skipping failed regions helps 
make as more as possible regions on correct RS.
{quote}And sounds like your proposed change will change behavior regarding the 
error handling
{quote}
The movement of regions to RSGroup is like serial, using 
submitAndWaitProcedure(), so later regions move means front regions moved 
successfully. My codes catch the Ex in one movement, and will retry to move 
failed regions afterwards. I think there should be some changes in my codes, 
when retry time is exhausted, the Exception of failed regions will still be 
threw, and front layer can handle it. What do you think about it?

 

 

 

> Interruption of moving regions in RSGroup will cause regions on wrong rs
> ------------------------------------------------------------------------
>
>                 Key: HBASE-22414
>                 URL: https://issues.apache.org/jira/browse/HBASE-22414
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>    Affects Versions: 2.2.0
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>         Attachments: HBASE-22414.master.001.patch
>
>
> We bulk moving regions to target RSGroup, and each movement of region will 
> submit a TRSP, but one TRSP encounters exception will make the whole movement 
> action terminate. Later regions will  not be moved to correct servers unless 
> reassign.
> I think we can skip failed moved regions, and retry to move after all has 
> been traversed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to