[
https://issues.apache.org/jira/browse/HBASE-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109472#comment-16109472
]
Umesh Agashe commented on HBASE-18366:
--------------------------------------
bq. Why this change sir: optional ServerName destination_server = 3;
destination_server when not specified, LoadBalancer will select it. It can be
used when region is required to be moved from RS but target server will be
selected by load balancer.
[~stack] and I had discussion on this JIRA, unit test TestServerCrashProcedure
and patched uploaded here. We identified following areas for improving code/
fixing bugs:
* Currently UnassignProcedure returns success when server carrying a region is
not online. Assumption here is that ServerCrashProcedure will handle splitting
logs etc for these regions. When UnassignProcedure completes,
MoveRegionProcedure resumes with AssignProcedure. AssignProcedure can assign
region without pre-requisite steps. Fix is to fail UnassignProcedure and parent
MoveRegionProcedure if source server is not online.
* Embed logic of selecting highest versioned region server for system table
regions in AssignmentManager.processAssignQueue(). This way from any section of
the code system table regions are re/assigned, only highest versioned RS are
considered for target servers.
* As ServerCrashProcedure handles reassignment of regions on a crashed server,
don't process those regions on crashed server through call to
AssignmentManager.checkIfShouldMoveSystemRegionAsync()
* Modify LoadBalancer implementation to consider highest versioned Region
Servers as favorites for system table regions.
* Look into ServerManager refactoring to make isServerOnline() and
isServerDead() mutually exclusive
All these issues are related to AMv2, I will create a JIRAs to track these
issues.
Thanks, Umesh
> Fix flaky test
> hbase.master.procedure.TestServerCrashProcedure#testRecoveryAndDoubleExecutionOnRsWithMeta
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-18366
> URL: https://issues.apache.org/jira/browse/HBASE-18366
> Project: HBase
> Issue Type: Bug
> Reporter: Umesh Agashe
> Assignee: Umesh Agashe
> Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: hbase-18366.fix1.patch, hbase-18366.fix2.patch
>
>
> It worked for a few days after enabling it with HBASE-18278. But started
> failing after commits:
> 6786b2b
> 68436c9
> 75d2eca
> 50bb045
> df93c13
> It works with one commit before: c5abb6c. Need to see what changed with those
> commits.
> Currently it fails with TableNotFoundException.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)