[
https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416295#comment-15416295
]
binlijin commented on HBASE-16393:
----------------------------------
Master balancer's jstack
{code}
hbase(main):002:0> balancer
ERROR: Call id=3, waitTime=180001, operationTimeout=180000 expired.
{code}
{code}
"B.defaultRpcServer.handler=31,queue=5,port=60100" daemon prio=10
tid=0x00007f3e2aec1800 nid=0x369b2 in Object.wait() [0x00007f3e1affd000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at org.apache.hadoop.ipc.Client.call(Client.java:1484)
- locked <0x0000000603eb5738> (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.Client.call(Client.java:1429)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:254)
at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)
at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:330)
at com.sun.proxy.$Proxy17.getBlockLocations(Unknown Source)
at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:330)
at com.sun.proxy.$Proxy17.getBlockLocations(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1205)
at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1195)
at
org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1245)
at
org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:220)
at
org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:216)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:216)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:208)
at
org.apache.hadoop.hbase.util.FSUtils.computeHDFSBlocksDistribution(FSUtils.java:1042)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.computeHDFSBlocksDistributionInternal(StoreFileInfo.java:294)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.computeHDFSBlocksDistribution(StoreFileInfo.java:284)
at
org.apache.hadoop.hbase.regionserver.HRegion.computeHDFSBlocksDistribution(HRegion.java:1083)
at
org.apache.hadoop.hbase.regionserver.HRegion.computeHDFSBlocksDistribution(HRegion.java:1058)
at
org.apache.hadoop.hbase.master.balancer.RegionLocationFinder.internalGetTopBlockLocation(RegionLocationFinder.java:127)
at
org.apache.hadoop.hbase.master.balancer.RegionLocationFinder$1.load(RegionLocationFinder.java:65)
at
org.apache.hadoop.hbase.master.balancer.RegionLocationFinder$1.load(RegionLocationFinder.java:61)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3584)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2372)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2335)
- locked <0x0000000603eabc40> (a
com.google.common.cache.LocalCache$StrongAccessEntry)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2250)
at com.google.common.cache.LocalCache.get(LocalCache.java:3985)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3989)
at
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4873)
at
org.apache.hadoop.hbase.master.balancer.RegionLocationFinder.getTopBlockLocations(RegionLocationFinder.java:105)
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.registerRegion(BaseLoadBalancer.java:433)
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:274)
at
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer$Cluster.<init>(BaseLoadBalancer.java:148)
at
org.apache.hadoop.hbase.master.balancer.SimpleLoadBalancer.balanceCluster(SimpleLoadBalancer.java:201)
at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1322)
- locked <0x000000061bafbb00> (a
org.apache.hadoop.hbase.master.balancer.SimpleLoadBalancer)
at
org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:395)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:48508)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2188)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:102)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:756)
{code}
> Improve computeHDFSBlocksDistribution
> -------------------------------------
>
> Key: HBASE-16393
> URL: https://issues.apache.org/jira/browse/HBASE-16393
> Project: HBase
> Issue Type: Improvement
> Reporter: binlijin
>
> With our cluster is big, i can see the balancer is slow from time to time.
> And the balancer will be called on master startup, so we can see the startup
> is slow also.
> The first thing i think whether if we can parallel compute different region's
> HDFSBlocksDistribution.
> The second i think we can improve compute single region's
> HDFSBlocksDistribution.
> When to compute a storefile's HDFSBlocksDistribution first we call
> FileSystem#getFileStatus(path) and then
> FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc
> call for every storefile. Instead we can use FileSystem#listLocatedStatus to
> get a LocatedFileStatus for the information we need, so reduce the namenode
> rpc call to one.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)