[ 
https://issues.apache.org/jira/browse/HBASE-20169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439563#comment-16439563
 ] 

Chia-Ping Tsai commented on HBASE-20169:
----------------------------------------

{quote}Trying to understand the severity here, sounds like not something that 
can happen on production (or even dev) deployment?
{quote}
It may happen on production. If HMaster#stop is called by any components in 
shutdown path, this issue will happen. For example, if no live regionservers 
are in the cluster, the ServerManager#shutdownCluster will call the 
HMaster#stop.
{quote}The trick always works here is make timeoutExecutor volatile, and assign 
it to a local variable, and then do the null check and call its method, or just 
do not set it to null...But I prefer we analysis the shutdown method again to 
see if we really need to call procedureExecutor.stop? 

We use timeout executor in a lot of places without null checks, so adding a 
single check here definitely feels insufficient.
{quote}
You are right. I''m trying to understand why we have to stop the timeout 
executor in HMaster#shutdown...the code is introduced by HBASE-19840. I run the 
TestMetaWithReplicas 30 times without stopping  timeout executor in 
HMaster#shutdown. All pass.  [~stack] WDYT?

BTW, the NPE is not related to this issue. Perhaps we can push the fix to 
TestAssignmentManagerMetrics first. And discuss the NPE in follow-up.

> NPE when calling HBTU.shutdownMiniCluster (TestAssignmentManagerMetrics is 
> flakey)
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-20169
>                 URL: https://issues.apache.org/jira/browse/HBASE-20169
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Duo Zhang
>            Assignee: stack
>            Priority: Major
>         Attachments: HBASE-20169.branch-2.001.patch, 
> HBASE-20169.branch-2.002.patch, HBASE-20169.branch-2.003.patch, 
> HBASE-20169.branch-2.004.patch, HBASE-20169.branch-2.005.patch, 
> HBASE-20169.v0.addendum.patch
>
>
> This usually happens when some master or rs has already been down before we 
> calling shutdownMiniCluster.
> See
> https://builds.apache.org/job/HBASE-Flaky-Tests/27223/testReport/junit/org.apache.hadoop.hbase.master/TestAssignmentManagerMetrics/org_apache_hadoop_hbase_master_TestAssignmentManagerMetrics/
> and also
> http://104.198.223.121:8080/job/HBASE-Flaky-Tests/34873/testReport/junit/org.apache.hadoop.hbase.master/TestRestartCluster/testRetainAssignmentOnRestart/
> {noformat}
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.master.TestAssignmentManagerMetrics.after(TestAssignmentManagerMetrics.java:100)
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hbase.master.TestRestartCluster.testRetainAssignmentOnRestart(TestRestartCluster.java:156)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to