[
https://issues.apache.org/jira/browse/HBASE-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093297#comment-13093297
]
Ming Ma commented on HBASE-4273:
--------------------------------
Regarding why regionLocation could be null, it comes from createTable failure.
Here are the scenarios. Let us assume we are dealing with large number of
regions, thus createTable, disableTable, enableTable can take a long time and
HMaster can restart in the middle. Also as part of the fix in hbase-3229, the
table state is set to ENABLING at the beginning of the operation and ENABLED at
the end of the operation. Previously, the table state is set to ENABLED at the
beginning of the operation.
t1: Application calls createTable. table's state is set to ENABLING ( or
ENABLED without hbase-3229 ).
t2: In createTable, HMaster updates .META. with regioninfo, and null
regionLocation.
t3: In createTable, before regions assignment start or finishes, HMaster
shutdown. That mean certain regions will have null regionLocation.
t4: HMaster restarts or the other HMaster takes over. The table's state is
ENABLING ( or ENABLED without hbase-3229 ). AssignmentManager will continue to
process to enable table ( or invoke AssignmentManager.assignUserRegions without
hbase-3229 )
t5: Application calls disableTable before all the regions are fully assigned.
So there are still regions with null regionLocation. table's state is set to
DISABLING.
t6: Before disableTable operation finishes, HMaster restarts.
In other words
1. With latest chunk, region could have null regionLocation while the table
state is DISABLING or ENABLING.
2. Without fix of hbase-3229, region could have null regionLocation while the
table state is DISABLING or ENABLING or ENABLED.
Regarding how the system handles null regionLocation, it seems to be ok.
1. As long as there is an entry in zookeeper for this RIT, eventually it should
be taken care of by AssignmentManager.processRegionInTransition and RS.
2. There is a chance that such region doesn't have an entry in zookeeper, for
example, before createTable starts the bulk assignment process HMaster retarts.
With the fix of hbase-3229, the table will be in ENABLING state and thus will
eventually gets to ENABLED state, the region will be assigned in the process.
Prior to fix of hbase-3229, the table could be in ENABLED state with such
region.
Couple suggestions of how to fix the issue. #1 should be enough. This issue
raises other questions, thus #2, #3, #4.
1. In AssignmentManager.rebuildUserRegions, remove the following lines inside
"if (regionLocation == null)" block.
if (false == checkIfRegionBelongsToDisabled(regionInfo)
&& false == checkIfRegionsBelongsToEnabling(regionInfo)) {
regions.put(regionInfo, regionLocation);
}
2. Application currently can disableTable while the table is in ENABLING state.
That could cause some issues. The system will try to unassign regions while
regions are being assigned. Can we only allow application disableTable when the
table is in ENABLED state, enableTable when the table is in DISABLED state?
3. After HMaster finishes initialization, it sets initialized==true. Before
initialization is done, application can still access HMaster given
isMasterRunning() returns true. Is it by design or should we wait until
HMaster.isInitialized() returns true? Couple services need to be initialized
before HMaster can accept requests.
4. Enhance hbck to report null regionLocation and consistency validation
between .META. state and zookeper state.
Comments?
> java.lang.NullPointerException when a table is being disabled and HMaster
> restarts
> ----------------------------------------------------------------------------------
>
> Key: HBASE-4273
> URL: https://issues.apache.org/jira/browse/HBASE-4273
> Project: HBase
> Issue Type: Bug
> Reporter: Ming Ma
> Assignee: Ming Ma
>
> This bug occurs in following scenario.
> 1. For some reason, the regionLocation isn't set in .META. table for some
> regions. Perhaps createTable didn't complete successfully.
> 1. The table of those regions is being disabled.
> 2. HMaster restarted.
> 3. At HMaster startup, it tries to transition from disabling to disabled
> state. It got the following exception.
> java.lang.NullPointerException: Passed server is null
> at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.
> java:581)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager
> .java:1093)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager
> .java:1040)
> at
> org.apache.hadoop.hbase.master.handler.DisableTableHandler$BulkDisabler$1.r
> un(DisableTableHandler.java:132)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j
> ava:886)
> In AssignmentManager.rebuildUserRegions, it added such regions to its regions
> list,
> if (regionLocation == null) {
> // Region not being served, add to region map with no assignment
> // If this needs to be assigned out, it will also be in ZK as RIT
> // add if the table is not in disabled and enabling state
> if (false == checkIfRegionBelongsToDisabled(regionInfo)
> && false == checkIfRegionsBelongsToEnabling(regionInfo)) {
> regions.put(regionInfo, regionLocation);
> }
> Perhaps, it should be
> if (regionLocation == null) {
> // Region not being served, add to region map with no assignment
> // If this needs to be assigned out, it will also be in ZK as RIT
> // add if the table is not in disabled and enabling state
> if (true == checkIfRegionBelongsToEnabled(regionInfo) {
> regions.put(regionInfo, regionLocation);
> }
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira