[ 
https://issues.apache.org/jira/browse/HBASE-13526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508585#comment-14508585
 ] 

Hudson commented on HBASE-13526:
--------------------------------

SUCCESS: Integrated in HBase-1.1 #424 (See 
[https://builds.apache.org/job/HBase-1.1/424/])
HBASE-13526 TestRegionServerReportForDuty can be flaky: hang or timeout 
(jerryjch: rev 92e689ddd8948335bb7211a5de3bc13ad7a2f7f2)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerReportForDuty.java


> TestRegionServerReportForDuty can be flaky: hang or timeout
> -----------------------------------------------------------
>
>                 Key: HBASE-13526
>                 URL: https://issues.apache.org/jira/browse/HBASE-13526
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0, 1.1.0, 0.98.12
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>             Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2, 1.2.0
>
>         Attachments: HBASE-13526.patch
>
>
> This test case is from HBASE-13317.
> The test uses a custom region server to simulate reportForDuty in a master 
> failover case.  This custom RS would start, then the primary master would 
> fail, then the custom RS would  reportForDuty to the second master after 
> master failover.
> The test occasionally will hang or timeout.
> The root cause is that during first master initialization, the master would 
> assign meta (and create and assign namespace table). It is possible that the 
> meta is assigned to the custom RS, which has started (place a rs node on the 
> ZK), but will not really check-in and be online. Then the master will go thru 
> multiple re-assignment, which can be lengthy and cause trouble.
> There are a couple of issues I see in the master assignment code:
> 1.  Master puts all the region servers obtained from ZK rs node into the 
> online server list, including those that have not checked-in via RPC.  And we 
> will assign meta or other regions based on whole list.
> 2. When one assign plan fails, we don't exclude the failed server when 
> picking the next destination, which may prolong the assignment process.
> I will provide a patch to fix the test case.  The other issues mentioned are 
> up to discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to