[
https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707086#comment-13707086
]
Nick Dimiduk commented on HBASE-8803:
-------------------------------------
This is on the console side, where I'm seeing one of two errors reported:
{noformat}
13/07/12 04:54:59 INFO region_mover: Moving 301 regions to
hor17n39.gq1.ygridcore.net,60020,1373604896136
13/07/12 04:55:05 INFO region_mover: Moving region 70236052 (1 of 301) to
server=hor17n39.gq1.ygridcore.net,60020,1373604896136
13/07/12 04:56:05 INFO region_mover: Retry 1 of maximum 5
13/07/12 04:57:05 INFO region_mover: Retry 2 of maximum 5
13/07/12 04:58:05 INFO region_mover: Retry 3 of maximum 5
13/07/12 04:59:05 INFO region_mover: Retry 4 of maximum 5
RuntimeError: Region stuck on hor17n37.gq1.ygridcore.net,60020,1373604896127,,
newserver=hor17n39.gq1.ygridcore.net,60020,1373604896136
move at /usr/lib/hbase/bin/region_mover.rb:175
loadRegions at /usr/lib/hbase/bin/region_mover.rb:418
call at org/jruby/RubyProc.java:270
call at org/jruby/RubyProc.java:224
{noformat}
and
{noformat}
13/07/12 05:00:14 INFO region_mover: Moving 1 region(s) from
hor18n39.gq1.ygridcore.net,60020,1373604896155with 1 threads during this cycle
13/07/12 05:00:14 INFO region_mover: Moving region 1028785192 (1 of 1) to
server=hor18n37.gq1.ygridcore.net,60020,1373604896136
NativeException: org.apache.hadoop.hbase.NotServingRegionException:
org.apache.hadoop.hbase.NotServingRegionException: Region is not online:
-ROOT-,,0
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3491)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:2186)
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:340)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
getTable at /usr/lib/hbase/bin/region_mover.rb:74
isSuccessfulScan at /usr/lib/hbase/bin/region_mover.rb:133
unloadRegions at /usr/lib/hbase/bin/region_mover.rb:350
call at org/jruby/RubyProc.java:270
call at org/jruby/RubyProc.java:224
{noformat}
In the Master log, I see this on occasion, though it doesn't appear to happen
at the same time as client errors. Here's an example.
{noformat}
2013-07-09 23:31:40,611 INFO
org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region
location in ZooKeeper
2013-07-09 23:31:40,630 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Failed assignment of -ROOT-,,0.70236052 to
hor18n38.gq1.ygridcore.net,60020,1373412384299, trying to assign elsewhere
instead; retry=0
java.io.IOException: Call to hor18n38.gq1.ygridcore.net/206.190.52.121:60020
failed on local exception: java.io.EOFException
at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1030)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
at $Proxy11.openRegion(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:543)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1697)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
at
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:2282)
at
org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignRoot(MetaServerShutdownHandler.java:129)
at
org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignRootWithRetries(MetaServerShutdownHandler.java:153)
at
org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:82)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
2013-07-09 23:31:40,632 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Unable to find a viable location to assign region -ROOT-,,0.70236052
{noformat}
I haven't had time to dedicate to investigating yet. I'll try to correlate some
log entries across the different machines. I don't think any of it is related
to your patch though :)
> region_mover.rb should move multiple regions at a time
> ------------------------------------------------------
>
> Key: HBASE-8803
> URL: https://issues.apache.org/jira/browse/HBASE-8803
> Project: HBase
> Issue Type: Bug
> Components: Usability
> Affects Versions: 0.98.0, 0.94.8, 0.95.1
> Reporter: Jean-Marc Spaggiari
> Assignee: Jean-Marc Spaggiari
> Fix For: 0.98.0, 0.95.2, 0.94.10
>
> Attachments: HBASE-8803-v0-trunk.patch, HBASE-8803-v1-0.94.patch,
> HBASE-8803-v1-trunk.patch, HBASE-8803-v2-0.94.patch, HBASE-8803-v2-0.94.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> When there is many regions in a cluster, rolling_restart can take hours
> because region_mover is moving the regions one by one.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira