[ 
https://issues.apache.org/jira/browse/HBASE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-2065:
--------------------------------------

    Attachment: HBASE-2065-2.patch

This new patch attempts to fix the issue seen in Hudson (a region being opened 
while table is disabled was falling in a hole) as well as a new issue found 
with more testing: a region closed and set unassigned by the master is seen by 
TableOperation as an assigned region since the server info is still there so 
it's then switched from "unassigned" to "closing" by ChangeTableState. The 
rever is never reassigned nor seen as closed.

More comments:

TestAdmin:
- I added a more difficult test where we enable and disable so root out more 
issues.

HRegion:
- Refactored offlineRegionInMETA to add a new method called 
removeServerInfoInMETA which does pretty much that.

ProcessRegionClose:
- Calls the new HRegion method in order to clean the .META. entry.

ChangeTableState:
- Added that if we are disabling a table and we see a pending open region, that 
we do not attempt to mark it as "closing" since the master will be confused 
when the region server reports opening the region. We will rely now on the next 
modif in HBA...

HBaseAdmin:
- Changed that when enabling/disabling, instead of calling the Master's method 
and wait, we call it on every iteration to take into account regions that are 
moving like a pending open.


> Cannot disable a table if any of its region is opening at the same time
> -----------------------------------------------------------------------
>
>                 Key: HBASE-2065
>                 URL: https://issues.apache.org/jira/browse/HBASE-2065
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>    Affects Versions: 0.20.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: HBASE-2065-2.patch, HBASE-2065-branch.patch, 
> HBASE-2065.patch
>
>
> Also found with the test in the parent jira:
> {code}
> 2009-12-21 18:31:44,411 INFO  [IPC Server handler 0 on 60000] 
> master.RegionManager(331): Assigning region table113,,1261449026166 to 
> 10.10.1.54,60853,1261448823301
> 2009-12-21 18:31:44,411 INFO  [IPC Server handler 0 on 60000] 
> master.RegionManager(331): Assigning region table121,,1261449041385 to 
> 10.10.1.54,60853,1261448823301
> 2009-12-21 18:31:44,411 INFO  [RegionServer:1] 
> regionserver.HRegionServer(475): MSG_REGION_OPEN: table113,,1261449026166
> 2009-12-21 18:31:44,411 INFO  [RegionServer:1] 
> regionserver.HRegionServer(475): MSG_REGION_OPEN: table121,,1261449041385
> ...
> 2009-12-21 18:31:44,418 INFO  [RegionServer:1.worker] 
> regionserver.HRegion(343): region table113,,1261449026166/21044806 available; 
> sequence id is 0
> ...
> 2009-12-21 18:31:44,445 DEBUG [IPC Server handler 4 on 60000] 
> master.ChangeTableState(121): Adding region table113,,1261449026166 to 
> setClosing list
> 2009-12-21 18:31:44,446 DEBUG [main] zookeeper.ZooKeeperWrapper(392): Read 
> ZNode /hbase/root-region-server got 10.10.1.54:608532009-12-21 18:31:44,447 
> DEBUG [main] client.HConnectionManager$TableServers(990): Found ROOT at 
> 10.10.1.54:60853
> 2009-12-21 18:31:44,447 DEBUG [main] 
> client.HConnectionManager$TableServers(899): Cached location for .META.,,1 is 
> 10.10.1.54:608552009-12-21 18:31:44,453 DEBUG [main] 
> client.HConnectionManager$TableServers(554): Rowscanned=1, rowsOffline=0
> 2009-12-21 18:31:44,454 DEBUG [main] client.HBaseAdmin(397): Sleep. Waiting 
> for all regions to be disabled from table1132009-12-21 18:31:44,554 DEBUG 
> [main] client.HBaseAdmin(406): Wake. Waiting for all regions to be disabled 
> from table113
> ...
> 2009-12-21 18:31:44,642 INFO  [RegionServer:0] 
> regionserver.HRegionServer(475): MSG_REGION_CLOSE: table113,,1261449026166
> ...
> 2009-12-21 18:31:44,642 INFO  [RegionServer:0.worker] 
> regionserver.HRegionServer$Worker(1332): Worker: MSG_REGION_CLOSE: 
> table113,,1261449026166
> ...
> 2009-12-21 18:31:44,664 INFO  [IPC Server handler 0 on 60000] 
> master.ServerManager(421): Processing MSG_REPORT_PROCESS_OPEN: 
> table113,,1261449026166 from 10.10.1.54,60853,1261448823301; 1 of 4
> ...
> 2009-12-21 18:31:44,664 INFO  [IPC Server handler 0 on 60000] 
> master.ServerManager(421): Processing MSG_REPORT_OPEN: 
> table113,,1261449026166 from 10.10.1.54,60853,1261448823301; 3 of 4
> 2009-12-21 18:31:44,664 DEBUG [IPC Server handler 0 on 60000] 
> master.ServerManager(562): region server 10.10.1.54:60853 should not have 
> opened region table113,,1261449026166
> 2009-12-21 18:31:44,666 INFO  [RegionServer:1] 
> regionserver.HRegionServer(475): MSG_REGION_CLOSE_WITHOUT_REPORT: 
> table113,,1261449026166: Duplicate assignment
> 2009-12-21 18:31:44,666 INFO  [RegionServer:1.worker] 
> regionserver.HRegionServer$Worker(1332): Worker: 
> MSG_REGION_CLOSE_WITHOUT_REPORT: table113,,1261449026166: Duplicate assignment
> {code}
> Here the master reassigned table13 and told the old region server to close 
> the region before the new one was able to report that it opened it. At the 
> end the new region server (good one) is also told to close it  After that my 
> test times out, table13 is not disabled neither it is deployed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to