[
https://issues.apache.org/jira/browse/HBASE-26596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463616#comment-17463616
]
Yutong Xiao edited comment on HBASE-26596 at 12/22/21, 7:30 AM:
----------------------------------------------------------------
Added a nil check in getSameRSGroupServers
{code:ruby}
# If the rsgroup is nil, that means this server belongs to no rsgroup.
# It should be already offline.
# So we just return and do nothing more.
if rsgroup.nil?
$LOG.warn("The server " + hostname + " belongs to no rsgroup. Is it already
offline?")
return results
end
{code}
and just return an empty list.
In unloadRegions if the returned server list is empty, it will exit.
(/bin/region_mover.rb:324)
was (Author: xytss123):
Added a nil check in getSameRSGroupServers
{code:ruby}
# If the rsgroup is nil, that means this server belongs to no rsgroup.
# It should be already offline.
# So we just return and do nothing more.
if rsgroup.nil?
$LOG.warn("The server " + hostname + " belongs to no rsgroup. Is it already
offline?")
return results
end
{code}
We can return an empty list.
In unloadRegions if the returned server list is empty, it will exit.
(/bin/region_mover.rb:324)
> region_mover should gracefully ignore null response from
> RSGroupAdmin#getRSGroupOfServer
> ----------------------------------------------------------------------------------------
>
> Key: HBASE-26596
> URL: https://issues.apache.org/jira/browse/HBASE-26596
> Project: HBase
> Issue Type: Bug
> Components: mover, rsgroup
> Affects Versions: 1.7.1
> Reporter: Viraj Jasani
> Assignee: Yutong Xiao
> Priority: Major
>
> If regionserver has any non-daemon thread running even after it's own
> shutdown, the running non-daemon thread can prevent clean JVM exit and
> regionserver could be stuck in the zombie state. We have recently provided a
> workaround for this in HBASE-26468 for regionserver exit hook to wait 30s for
> all non-daemon threads to get stopped before terminating JVM abnormally.
> However, if regionserver is stuck in such state, region_mover unload fails
> with:
> {code:java}
> NoMethodError: undefined method `getName` for nil:NilClass
> getSameRSGroupServers at /bin/region_mover.rb:503
> __ensure__ at /bin/region_mover.rb:313
> unloadRegions at /bin/region_mover.rb:310
> (root) at /bin/region_mover.rb:572
> {code}
> This happens if the cluster has RSGroup enabled and the given server is
> already stopped, hence RSGroupAdmin#getRSGroupOfServer would return null (as
> the server is not running anymore so it is not part of any RSGroup).
> region_mover should ride over this null response and gracefully exit from
> unloadRegions() call.
>
> We should also check if the fix is applicable to branch-2 and above.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)