[ 
https://issues.apache.org/jira/browse/HBASE-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023698#comment-17023698
 ] 

Michael Stack commented on HBASE-23737:
---------------------------------------

Patch did not work so dug in more. Problem is that we create FavoredNodes 
global plan as part of Master startup but then later, after hbase:meta is up, 
we replace the FNPlan instance with a new one made from state based on what was 
scanned from meta; i.e. we overwrite the Map that was made on construction. In 
test, you can see that concurrently, we can add a Regions FNs if it is assigned 
quickly at startup. The call to initialize will overwrite these FNs. Later in 
the test when we check the hbase:meta table to ensure all have FNs, these 
overwritten ones come back with null List.

Pushing addendum.

> [Flakey Tests] TestFavoredNodeTableImport fails 30% of the time
> ---------------------------------------------------------------
>
>                 Key: HBASE-23737
>                 URL: https://issues.apache.org/jira/browse/HBASE-23737
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Priority: Major
>         Attachments: 
> 0001-HBASE-23737-Flakey-Tests-TestFavoredNodeTableImport-.patch, 
> 0001-HBASE-23737-Flakey-Tests-TestFavoredNodeTableImport.addendum.fix.patch
>
>
> Spent time on TestFavoredNodeTableImport. It fails w/ an NPE when we go to 
> get favorednodes for one of the regions. It is sporadic. Fails for me locally 
> too about 30% of the time.
> I tried to study where we are going wrong. The balancer is disabled when we 
> start the cluster up again on FN balancer... but this don't seem to be the 
> problem.
> Looks like laggard Regions taking their time to open means they don't show in 
> the global list of favored nodes when the checking runs. Adding a wait till 
> no RIT seems to stabilize the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to