[ 
https://issues.apache.org/jira/browse/HBASE-26596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476947#comment-17476947
 ] 

Yutong Xiao commented on HBASE-26596:
-------------------------------------

[~vjasani] The PR has been pending for a time.  Not sure if it is OK for the 
latest commit to merge. Could you please have a look then?

> region_mover should gracefully ignore null response from 
> RSGroupAdmin#getRSGroupOfServer
> ----------------------------------------------------------------------------------------
>
>                 Key: HBASE-26596
>                 URL: https://issues.apache.org/jira/browse/HBASE-26596
>             Project: HBase
>          Issue Type: Bug
>          Components: mover, rsgroup
>    Affects Versions: 1.7.1
>            Reporter: Viraj Jasani
>            Assignee: Yutong Xiao
>            Priority: Major
>
> If regionserver has any non-daemon thread running even after it's own 
> shutdown, the running non-daemon thread can prevent clean JVM exit and 
> regionserver could be stuck in the zombie state. We have recently provided a 
> workaround for this in HBASE-26468 for regionserver exit hook to wait 30s for 
> all non-daemon threads to get stopped before terminating JVM abnormally.
> However, if regionserver is stuck in such state, region_mover unload fails 
> with:
> {code:java}
> NoMethodError: undefined method `getName` for nil:NilClass
>   getSameRSGroupServers at /bin/region_mover.rb:503
>              __ensure__ at /bin/region_mover.rb:313 
>           unloadRegions at /bin/region_mover.rb:310               
>                  (root) at /bin/region_mover.rb:572               
>  {code}
> This happens if the cluster has RSGroup enabled and the given server is 
> already stopped, hence RSGroupAdmin#getRSGroupOfServer would return null (as 
> the server is not running anymore so it is not part of any RSGroup). 
> region_mover should ride over this null response and gracefully exit from 
> unloadRegions() call.
>  
> We should also check if the fix is applicable to branch-2 and above.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to