[
https://issues.apache.org/jira/browse/HBASE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023698#comment-17023698
]
Michael Stack commented on HBASE-23737:
---------------------------------------
Patch did not work so dug in more. Problem is that we create FavoredNodes
global plan as part of Master startup but then later, after hbase:meta is up,
we replace the FNPlan instance with a new one made from state based on what was
scanned from meta; i.e. we overwrite the Map that was made on construction. In
test, you can see that concurrently, we can add a Regions FNs if it is assigned
quickly at startup. The call to initialize will overwrite these FNs. Later in
the test when we check the hbase:meta table to ensure all have FNs, these
overwritten ones come back with null List.
Pushing addendum.
> [Flakey Tests] TestFavoredNodeTableImport fails 30% of the time
> ---------------------------------------------------------------
>
> Key: HBASE-23737
> URL: https://issues.apache.org/jira/browse/HBASE-23737
> Project: HBase
> Issue Type: Bug
> Reporter: Michael Stack
> Priority: Major
> Attachments:
> 0001-HBASE-23737-Flakey-Tests-TestFavoredNodeTableImport-.patch,
> 0001-HBASE-23737-Flakey-Tests-TestFavoredNodeTableImport.addendum.fix.patch
>
>
> Spent time on TestFavoredNodeTableImport. It fails w/ an NPE when we go to
> get favorednodes for one of the regions. It is sporadic. Fails for me locally
> too about 30% of the time.
> I tried to study where we are going wrong. The balancer is disabled when we
> start the cluster up again on FN balancer... but this don't seem to be the
> problem.
> Looks like laggard Regions taking their time to open means they don't show in
> the global list of favored nodes when the checking runs. Adding a wait till
> no RIT seems to stabilize the test.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)