[
https://issues.apache.org/jira/browse/HBASE-10101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843448#comment-13843448
]
Jeffrey Zhong commented on HBASE-10101:
---------------------------------------
The ZK clean is only clear the master address node and RS nodes which should be
removed when a cluster is shut down. The added steps make sure we have a clean
restart for normal unit tests and there are special cases for master(cluster)
restart scenarios.
I prefer the test case in TestAssignmentManagerOnCluster because it's about
region aren't be assigned during a cluster restart.
Below are my comments on the trunk patch:
{code}
+ regionStates.setLastRegionServerOfRegion(sn, encodedName);
+ if (regionInfo.isMetaRegion()) {
+ // If it's meta region, reset the meta location.
+ // So that master knows the right meta region server.
+ MetaRegionTracker.setMetaLocation(watcher, sn);
+ }
{code}
The above is a little dramatic because we just set internal Memory state to
some server. This'll cause confusion for the future readers.
{code}
- if (expireIfOnline(currentMetaServer)) {
+ if (!serverManager.isServerDead(currentMetaServer)) {
{code}
This isn't ideal because we could have a race condition that a dead meta server
may not report(SessionException) in time. We could skip meta re-assign and
cause master can't be started.
[~jxiang] For your latest patch, it looks good to me except the changes in
HMaster.java. I'd prefer my v3-update patch unless you have a strong feeling
about your trunk patch.
I'll let you decide which to choose and move on. Thanks.
> testOfflineRegionReAssginedAfterMasterRestart times out sometimes.
> ------------------------------------------------------------------
>
> Key: HBASE-10101
> URL: https://issues.apache.org/jira/browse/HBASE-10101
> Project: HBase
> Issue Type: Bug
> Reporter: Jimmy Xiang
> Assignee: Jeffrey Zhong
> Attachments: hbase-10101-v2.patch, hbase-10101-v3-update.patch,
> hbase-10101-v3.patch, hbase-10101.patch, test.log, trunk-10101.patch,
> trunk-10101_v2.patch
>
>
> Sometimes, I got this test timed out. The log is attached. It could be
> because the new cluster takes a while to process the dead server, or assign
> meta.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)