[
https://issues.apache.org/jira/browse/HBASE-13526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508559#comment-14508559
]
Hudson commented on HBASE-13526:
--------------------------------
FAILURE: Integrated in HBase-1.1.0RC0-JDK8 #4 (See
[https://builds.apache.org/job/HBase-1.1.0RC0-JDK8/4/])
HBASE-13526 TestRegionServerReportForDuty can be flaky: hang or timeout
(jerryjch: rev 92e689ddd8948335bb7211a5de3bc13ad7a2f7f2)
*
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerReportForDuty.java
> TestRegionServerReportForDuty can be flaky: hang or timeout
> -----------------------------------------------------------
>
> Key: HBASE-13526
> URL: https://issues.apache.org/jira/browse/HBASE-13526
> Project: HBase
> Issue Type: Bug
> Components: test
> Affects Versions: 2.0.0, 1.1.0, 0.98.12
> Reporter: Jerry He
> Assignee: Jerry He
> Priority: Minor
> Fix For: 2.0.0, 1.1.0, 0.98.13, 1.0.2, 1.2.0
>
> Attachments: HBASE-13526.patch
>
>
> This test case is from HBASE-13317.
> The test uses a custom region server to simulate reportForDuty in a master
> failover case. This custom RS would start, then the primary master would
> fail, then the custom RS would reportForDuty to the second master after
> master failover.
> The test occasionally will hang or timeout.
> The root cause is that during first master initialization, the master would
> assign meta (and create and assign namespace table). It is possible that the
> meta is assigned to the custom RS, which has started (place a rs node on the
> ZK), but will not really check-in and be online. Then the master will go thru
> multiple re-assignment, which can be lengthy and cause trouble.
> There are a couple of issues I see in the master assignment code:
> 1. Master puts all the region servers obtained from ZK rs node into the
> online server list, including those that have not checked-in via RPC. And we
> will assign meta or other regions based on whole list.
> 2. When one assign plan fails, we don't exclude the failed server when
> picking the next destination, which may prolong the assignment process.
> I will provide a patch to fix the test case. The other issues mentioned are
> up to discussion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)