[ 
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982453#action_12982453
 ] 

Todd Lipcon commented on HBASE-3446:
------------------------------------

After digging through the logs, I found the following:

2011-01-16 18:03:26,164 DEBUG 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Offlined and 
split region 
usertable,user136857679,1295149082811.9f2822a04028c86813fe71264da5c167.; 
checking daughter presence
2011-01-16 18:03:26,169 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
Caught throwable while processing event M_SERVER_SHUTDOWN
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2360)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1754)
...
        at $Proxy6.openScanner(Unknown Source)
        at 
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:260)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.isDaughterMissing(ServerShutdownHandler.java:256)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.fixupDaughter(ServerShutdownHandler.java:214)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.fixupDaughters(ServerShutdownHandler.java:196)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.processDeadRegion(ServerShutdownHandler.java:181)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:151)

Neither the MetaReader code nor the ServerShutdown handler has any kind of 
retry/blocking behavior built in here. So many of the regions on the server 
were left unassigned.

> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
>                 Key: HBASE-3446
>                 URL: https://issues.apache.org/jira/browse/HBASE-3446
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and 
> afterwards had LOTS of regions left orphaned. The issue appears to be that 
> ProcessServerShutdown failed because the server hosting META was restarted 
> around the same time as another server was being processed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to