[ https://issues.apache.org/jira/browse/HBASE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Daniel Cryans updated HBASE-2065: -------------------------------------- Attachment: HBASE-2065-2.patch This new patch attempts to fix the issue seen in Hudson (a region being opened while table is disabled was falling in a hole) as well as a new issue found with more testing: a region closed and set unassigned by the master is seen by TableOperation as an assigned region since the server info is still there so it's then switched from "unassigned" to "closing" by ChangeTableState. The rever is never reassigned nor seen as closed. More comments: TestAdmin: - I added a more difficult test where we enable and disable so root out more issues. HRegion: - Refactored offlineRegionInMETA to add a new method called removeServerInfoInMETA which does pretty much that. ProcessRegionClose: - Calls the new HRegion method in order to clean the .META. entry. ChangeTableState: - Added that if we are disabling a table and we see a pending open region, that we do not attempt to mark it as "closing" since the master will be confused when the region server reports opening the region. We will rely now on the next modif in HBA... HBaseAdmin: - Changed that when enabling/disabling, instead of calling the Master's method and wait, we call it on every iteration to take into account regions that are moving like a pending open. > Cannot disable a table if any of its region is opening at the same time > ----------------------------------------------------------------------- > > Key: HBASE-2065 > URL: https://issues.apache.org/jira/browse/HBASE-2065 > Project: Hadoop HBase > Issue Type: Sub-task > Affects Versions: 0.20.2 > Reporter: Jean-Daniel Cryans > Assignee: Jean-Daniel Cryans > Fix For: 0.20.3, 0.21.0 > > Attachments: HBASE-2065-2.patch, HBASE-2065-branch.patch, > HBASE-2065.patch > > > Also found with the test in the parent jira: > {code} > 2009-12-21 18:31:44,411 INFO [IPC Server handler 0 on 60000] > master.RegionManager(331): Assigning region table113,,1261449026166 to > 10.10.1.54,60853,1261448823301 > 2009-12-21 18:31:44,411 INFO [IPC Server handler 0 on 60000] > master.RegionManager(331): Assigning region table121,,1261449041385 to > 10.10.1.54,60853,1261448823301 > 2009-12-21 18:31:44,411 INFO [RegionServer:1] > regionserver.HRegionServer(475): MSG_REGION_OPEN: table113,,1261449026166 > 2009-12-21 18:31:44,411 INFO [RegionServer:1] > regionserver.HRegionServer(475): MSG_REGION_OPEN: table121,,1261449041385 > ... > 2009-12-21 18:31:44,418 INFO [RegionServer:1.worker] > regionserver.HRegion(343): region table113,,1261449026166/21044806 available; > sequence id is 0 > ... > 2009-12-21 18:31:44,445 DEBUG [IPC Server handler 4 on 60000] > master.ChangeTableState(121): Adding region table113,,1261449026166 to > setClosing list > 2009-12-21 18:31:44,446 DEBUG [main] zookeeper.ZooKeeperWrapper(392): Read > ZNode /hbase/root-region-server got 10.10.1.54:608532009-12-21 18:31:44,447 > DEBUG [main] client.HConnectionManager$TableServers(990): Found ROOT at > 10.10.1.54:60853 > 2009-12-21 18:31:44,447 DEBUG [main] > client.HConnectionManager$TableServers(899): Cached location for .META.,,1 is > 10.10.1.54:608552009-12-21 18:31:44,453 DEBUG [main] > client.HConnectionManager$TableServers(554): Rowscanned=1, rowsOffline=0 > 2009-12-21 18:31:44,454 DEBUG [main] client.HBaseAdmin(397): Sleep. Waiting > for all regions to be disabled from table1132009-12-21 18:31:44,554 DEBUG > [main] client.HBaseAdmin(406): Wake. Waiting for all regions to be disabled > from table113 > ... > 2009-12-21 18:31:44,642 INFO [RegionServer:0] > regionserver.HRegionServer(475): MSG_REGION_CLOSE: table113,,1261449026166 > ... > 2009-12-21 18:31:44,642 INFO [RegionServer:0.worker] > regionserver.HRegionServer$Worker(1332): Worker: MSG_REGION_CLOSE: > table113,,1261449026166 > ... > 2009-12-21 18:31:44,664 INFO [IPC Server handler 0 on 60000] > master.ServerManager(421): Processing MSG_REPORT_PROCESS_OPEN: > table113,,1261449026166 from 10.10.1.54,60853,1261448823301; 1 of 4 > ... > 2009-12-21 18:31:44,664 INFO [IPC Server handler 0 on 60000] > master.ServerManager(421): Processing MSG_REPORT_OPEN: > table113,,1261449026166 from 10.10.1.54,60853,1261448823301; 3 of 4 > 2009-12-21 18:31:44,664 DEBUG [IPC Server handler 0 on 60000] > master.ServerManager(562): region server 10.10.1.54:60853 should not have > opened region table113,,1261449026166 > 2009-12-21 18:31:44,666 INFO [RegionServer:1] > regionserver.HRegionServer(475): MSG_REGION_CLOSE_WITHOUT_REPORT: > table113,,1261449026166: Duplicate assignment > 2009-12-21 18:31:44,666 INFO [RegionServer:1.worker] > regionserver.HRegionServer$Worker(1332): Worker: > MSG_REGION_CLOSE_WITHOUT_REPORT: table113,,1261449026166: Duplicate assignment > {code} > Here the master reassigned table13 and told the old region server to close > the region before the new one was able to report that it opened it. At the > end the new region server (good one) is also told to close it After that my > test times out, table13 is not disabled neither it is deployed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.