[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Jim Kellerman (JIRA) Mon, 05 Jan 2009 16:26:09 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661000#action_12661000
 ]


Jim Kellerman commented on HBASE-1104:
--------------------------------------

Ok, I can see where this could be confusing. In HBASE-543, a newly discovered 
region would be
set to 'unassigned'. 

- if a region is 'unassigned' it is a candidate to be opened by the next region 
server that checks in.
- if the region is assigned to a region server, it is marked as assigned.
- when the region server reports that it has opened the region, it is marked as 
pending

Once ProcessRegionOpen runs and the HRS is has been stored in the META table, 
it is removed
from the Map of regionsInTransition

- When it is determined that a region should be closed, the region is marked as 
'closing'
- When the master sends the close message to the HRS, the region's status is 
set as
  closing + closed (and if the region is being off-lined in the process, the 
status is: closing +
  closed + offlined)

Once the HRS reports that a region is closed, ProcessRegionClose is called. If 
the region 
should be reassigned (i.e., offlined == false), then the region status is set 
to unassigned
so that it will get picked up and assigned to the first region server that 
reports in that is not
overloaded.

If the region has been offlined, ProcessRegionClose will remove the region from 
the
regionsInTransition Map. 

Ok, so what does this boil down to? There are three states for getting a region 
served: 
1) unassigned 
2) assigned 
3) pending

However, for regions being closed it is more complex:
- closing means the region is in the process of being closed
- closing + closed means that the master has told the HRS to close the region.
- closing + offline means that the master wants to close the region and have it 
offlined
- closing + closed + offline means that the master has told the HRS to close 
the region,
  and that it will be offlined once the HRS reports that it has closed the 
region.

The reason for this approach was that if a region was closing, it could not be 
marked
as unassigned. Only ProcessRegionClose would know if the region should be 
reassigned,
and if not, it would remove the region from the regionsInTransition Map. If the 
region was
to be reassigned, it would stay in the map and its status would be changed to 
"unassigned"

As opening a region requires three states (unassigned, assigned, pending), 
closing a region
should be similar:
- close -- region server should be told that region is to be closed when the 
HRS reports in
- closing -- the HRS has been told to close the region
- closed - HRS reports that the region is closed.

When a region has a status of closing, it also has a substatus of closing 
and/or offlined.
If offlined, and the status == closed, then the master should remove the region 
from the 
regionsInTransition Map. If not offlined, the region should have its status set 
to unassigned.

So that is how it should work, but because starting up a region requires three 
state transitions
and closing one down currently only requires two, it is confusing.

Changing region close to be symmetrical with region open should clarify (and 
simplify) how
regions get reassigned.



> Doubly-assigned regions redux
> -----------------------------
>
>                 Key: HBASE-1104
>                 URL: https://issues.apache.org/jira/browse/HBASE-1104
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: pset cluster with TRUNK.
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.19.0
>
>
> Testing, I see doubly assigned regions.  Below is from master log for 
> TestTable,0000135598,1230761605500.
> {code}
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_SPLIT: 
> TestTable,0000116170,1230761152219: TestTable,0000116170,1230761152219 split; 
> daughters: TestTable,0000116170,1230761605500, 
> TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:35,528 [IPC Server handler 2 on 60000] INFO 
> org.apache.hadoop.hbase.master.RegionManager: assigning region 
> TestTable,0000135598,1230761605500 to server XX.XX.XX.142:60020
> 2008-12-31 22:13:38,561 [IPC Server handler 6 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: 
> TestTable,0000135598,1230761605500 open on XX.XX.XX.142:60020
> 2008-12-31 22:13:38,562 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row 
> TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 
> 1230759988953 and server XX.XX.XX.142:60020
> 2008-12-31 22:13:44,640 [IPC Server handler 4 on 60000] DEBUG 
> org.apache.hadoop.hbase.master.RegionManager: Going to close region 
> TestTable,0000135598,1230761605500
> 2008-12-31 22:13:50,441 [IPC Server handler 9 on 60000] INFO 
> org.apache.hadoop.hbase.master.RegionManager: assigning region 
> TestTable,0000135598,1230761605500 to server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,457 [IPC Server handler 5 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received 
> MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from 
> XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [IPC Server handler 5 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,0000135598,1230761605500 from XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: 
> TestTable,0000135598,1230761605500 open on XX.XX.XX.139:60020
> 2008-12-31 22:13:53,458 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row 
> TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 
> 1230759988788 and server XX.XX.XX.139:60020
> 2008-12-31 22:13:53,688 [IPC Server handler 6 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_CLOSE: 
> TestTable,0000135598,1230761605500 from XX.XX.XX.142:60020
> 2008-12-31 22:13:53,688 [HMaster] DEBUG 
> org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessRegionClose 
> of TestTable,0000135598,1230761605500, false
> 2008-12-31 22:13:54,263 [IPC Server handler 7 on 60000] INFO 
> org.apache.hadoop.hbase.master.RegionManager: assigning region 
> TestTable,0000135598,1230761605500 to server XX.XX.XX.141:60020
> 2008-12-31 22:13:57,273 [IPC Server handler 9 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received 
> MSG_REPORT_PROCESS_OPEN: TestTable,0000135598,1230761605500 from 
> XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [IPC Server handler 0 on 60000] INFO 
> org.apache.hadoop.hbase.master.ServerManager: Received MSG_REPORT_OPEN: 
> TestTable,0000135598,1230761605500 from XX.XX.XX.141:60020
> 2008-12-31 22:14:03,917 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: 
> TestTable,0000135598,1230761605500 open on XX.XX.XX.141:60020
> 2008-12-31 22:14:03,918 [HMaster] INFO 
> org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row 
> TestTable,0000135598,1230761605500 in region .META.,,1 with startcode 
> 1230759989031 and server XX.XX.XX.141:60020
> 2008-12-31 22:14:29,350 [RegionManager.metaScanner] DEBUG 
> org.apache.hadoop.hbase.master.BaseScanner: 
> TestTable,0000135598,1230761605500 no longer has references to 
> TestTable,0000116170,1230761152219
> {code}
> See how we choose to assign before we get the close back from the 
> regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1104) Doubly-assigned regions redux

Reply via email to