[
https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982453#action_12982453
]
Todd Lipcon commented on HBASE-3446:
------------------------------------
After digging through the logs, I found the following:
2011-01-16 18:03:26,164 DEBUG
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Offlined and
split region
usertable,user136857679,1295149082811.9f2822a04028c86813fe71264da5c167.;
checking daughter presence
2011-01-16 18:03:26,169 ERROR org.apache.hadoop.hbase.executor.EventHandler:
Caught throwable while processing event M_SERVER_SHUTDOWN
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Server not running
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2360)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1754)
...
at $Proxy6.openScanner(Unknown Source)
at
org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:260)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.isDaughterMissing(ServerShutdownHandler.java:256)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.fixupDaughter(ServerShutdownHandler.java:214)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.fixupDaughters(ServerShutdownHandler.java:196)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.processDeadRegion(ServerShutdownHandler.java:181)
at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:151)
Neither the MetaReader code nor the ServerShutdown handler has any kind of
retry/blocking behavior built in here. So many of the regions on the server
were left unassigned.
> ProcessServerShutdown fails if META moves, orphaning lots of regions
> --------------------------------------------------------------------
>
> Key: HBASE-3446
> URL: https://issues.apache.org/jira/browse/HBASE-3446
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Priority: Blocker
>
> I ran a rolling restart on a 5 node cluster with lots of regions, and
> afterwards had LOTS of regions left orphaned. The issue appears to be that
> ProcessServerShutdown failed because the server hosting META was restarted
> around the same time as another server was being processed
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.