[
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lei Chen updated HBASE-13965:
-----------------------------
Attachment: HBASE-13965-branch-1-v2.patch
Updates:
1. wrapped a long line (> 100)
The failed test from last patch seems not related. Here is the log:
testWalRollOnLowReplication(org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS)
Time elapsed: 3.804 sec <<< ERROR!
java.lang.RuntimeException: sync aborted
at
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:491)
at
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.insert(WALProcedureStore.java:334)
at
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS.testWalRollOnLowReplication(TestWALProcedureStoreOnHDFS.java:189)
Caused by: org.apache.hadoop.ipc.RemoteException: File
/test-logs/state-00000000000000000006.log could only be replicated to 2 nodes
instead of minReplication (=3). There are 3 datanode(s) running and 3 node(s)
are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:368)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1449)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)
> Stochastic Load Balancer JMX Metrics
> ------------------------------------
>
> Key: HBASE-13965
> URL: https://issues.apache.org/jira/browse/HBASE-13965
> Project: HBase
> Issue Type: Improvement
> Components: Balancer, metrics
> Reporter: Lei Chen
> Assignee: Lei Chen
> Fix For: 2.0.0
>
> Attachments: 13965-addendum.txt, HBASE-13965-branch-1-v2.patch,
> HBASE-13965-branch-1.patch, HBASE-13965-v10.patch, HBASE-13965-v11.patch,
> HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch,
> HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch,
> HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png,
> HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png
>
>
> Today’s default HBase load balancer (the Stochastic load balancer) is cost
> function based. The cost function weights are tunable but no visibility into
> those cost function results is directly provided.
> A driving example is a cluster we have been tuning which has skewed rack size
> (one rack has half the nodes of the other few racks). We are tuning the
> cluster for uniform response time from all region servers with the ability to
> tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and
> RegionCountSkew Cost is difficult without a way to attribute each cost
> function’s contribution to overall cost.
> What this jira proposes is to provide visibility via JMX into each cost
> function of the stochastic load balancer, as well as the overall cost of the
> balancing plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)