[
https://issues.apache.org/jira/browse/HBASE-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023352#comment-13023352
]
Prakash Khemani commented on HBASE-3815:
----------------------------------------
Log snippets showing assignment-manager continuously choosing server-132 for
region assignment even though it constantly fails. There ought to be a global
exclude list in addition to a per region exclude list?
2011-04-17 07:14:06,312 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region
realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87.
to pumahbase132.snc5.facebook.com,60020,1303046136711
2011-04-17 07:14:06,314 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Failed assignment of
realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87.
to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711,
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign
elsewhere instead; retry=0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.openRegion(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
at
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2011-04-17 07:14:06,314 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
No previous transition plan was found (or we are ignoring an existing plan) for
realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87.
so generated a random one;
hri=realtime_domain_feed_imps_domains,e0a3d6b6,1289467228948.64f5ad9ca3f4d6a235365f10ccc4ae87.,
src=, dest=pumahbase156.snc5.facebook.com,60020,1302847439345; 72 (online=72,
exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303046136711,
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987)) available servers
2011-04-17 07:19:06,097 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region
realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac.
to pumahbase132.snc5.facebook.com,60020,1303046136711
2011-04-17 07:19:06,098 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Failed assignment of
realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac.
to serverName=pumahbase132.snc5.facebook.com,60020,1303046136711,
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign
elsewhere instead; retry=0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.openRegion(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
at
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2011-04-17 07:19:06,098 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
No previous transition plan was found (or we are ignoring an existing plan) for
realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac.
so generated a random one;
hri=realtime_domain_acts_urls_hot,9b851e7e,1290555837119.c41a23fd0bd57d1eb4c3a5ef1ed6ccac.,
src=, dest=pumahbase150.snc5.facebook.com,60020,1302847439118; 72 (online=72,
exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303046136711,
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987)) available servers
2011-04-17 07:19:08,018 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Using pre-existing plan for region
realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi
182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a.;
plan=hri=realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit
sefi 182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a.,
src=pumahbase156.snc5.facebook.com,60020,1302847439345,
dest=pumahbase132.snc5.facebook.com,60020,1303046136711
2011-04-17 07:19:08,018 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region
realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi
182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a. to
pumahbase132.snc5.facebook.com,60020,1303046136711
2011-04-17 07:19:08,020 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Failed assignment of
realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi
182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a. to
serverName=pumahbase132.snc5.facebook.com,60020,1303046136711,
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987), trying to assign
elsewhere instead; retry=0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.openRegion(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
at
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2011-04-17 07:19:08,020 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
No previous transition plan was found (or we are ignoring an existing plan) for
realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi
182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a. so generated a
random one;
hri=realtime_domain_feed_imps_urls,beec529151bb2bf4c2715f8a6fb9423cly.bit sefi
182668488437983,1298404111923.b12b2ab3839ed40bbf0139802db3d23a., src=,
dest=pumahbase193.snc5.facebook.com,60020,1302847439839; 72 (online=72,
exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303046136711,
load=(requests=0, regions=81, usedHeap=155, maxHeap=31987)) available servers
... and this continues till late in the night
2011-04-17 23:40:28,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Using pre-existing plan for region
realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1.;
plan=hri=realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1.,
src=pumahbase133.snc5.facebook.com,60020,1302847439080,
dest=pumahbase132.snc5.facebook.com,60020,1303107053536
2011-04-17 23:40:28,560 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Assigning region
realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1.
to pumahbase132.snc5.facebook.com,60020,1303107053536
2011-04-17 23:40:28,562 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Failed assignment of
realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1.
to serverName=pumahbase132.snc5.facebook.com,60020,1303107053536,
load=(requests=7214, regions=1, usedHeap=243, maxHeap=31987), trying to assign
elsewhere instead; retry=0
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
at
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:884)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:751)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.openRegion(Unknown Source)
at
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:547)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
at
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2011-04-17 23:40:28,562 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
No previous transition plan was found (or we are ignoring an existing plan) for
realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1.
so generated a random one;
hri=realtime_domain_imps_domains,feb8518c,1289277895527.2f56039fb808ef84bd78fcf9c15c5fb1.,
src=, dest=pumahbase170.snc5.facebook.com,60020,1302847439039; 72 (online=72,
exclude=serverName=pumahbase132.snc5.facebook.com,60020,1303107053536,
load=(requests=7214, regions=1, usedHeap=243, maxHeap=31987)) available servers
> lb should ignore bad region servers
> -----------------------------------
>
> Key: HBASE-3815
> URL: https://issues.apache.org/jira/browse/HBASE-3815
> Project: HBase
> Issue Type: Bug
> Reporter: Prakash Khemani
>
> the loadbalancer should remember which region server is constantly having
> trouble opening regions and it should take that rs out of the equation ...
> otherwise the lb goes into an unproductive loop ...
> I don't have logs handy for this one.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira