[
https://issues.apache.org/jira/browse/HBASE-15594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229310#comment-15229310
]
stack commented on HBASE-15594:
-------------------------------
bq. Checking the code, I'm afraid with hbase.client.ipc.pool.size set to 100
there would still be separate connection created for the same ConnectionId, so
the setting would cause the same problem as the YCSB-651 bug.
How you make out that [~carp84]? There is no ConnectionId going on (at least in
tip of branch-1 that I am working on) and if I look at the stack traces with
hbase.client.ipc.pool.size == #cpus, it seems well-behaved. I count #cpu
connections.
On other hand, there is a zk connection issue: For each connection, there is
its own zk connection which makes no sense. Connections should be sharing a zk
connection. Each ycsb instance has hundreds of zk threads...(for 48
connections, there are 248 zk threads). There were so many zk threads, it was
the bottleneck for me putting more load on the cluster. Had to up the ensemble
connections from 300 to 3000 hbase.zookeeper.property.maxClientCnxns. I was
seeing complaint that we were at max connections in ensemble logs. Need to fix
this.
Setting hbase.client.ipc.pool.size to #cpus doubles my ycsb throughput. At the
default of 1 or 2 or 12, my throughput is way less.
> [YCSB] Improvements
> -------------------
>
> Key: HBASE-15594
> URL: https://issues.apache.org/jira/browse/HBASE-15594
> Project: HBase
> Issue Type: Umbrella
> Reporter: stack
> Priority: Critical
>
> Running YCSB and getting good results is an arcane art. For example, in my
> testing, a few handlers (100) with as many readers as I had CPUs (48), and
> upping connections on clients to same as #cpus made for 2-3x the throughput.
> The above config changes came of lore; which configurations need tweaking is
> not obvious going by their names, there were no indications from the app on
> where/why we were blocked or on which metrics are important to consider. Nor
> was any of this stuff written down in docs.
> Even still, I am stuck trying to make use of all of the machine. I am unable
> to overrun a server though 8 client nodes trying to beat up a single node
> (workloadc, all random-read, with no data returned -p readallfields=false).
> There is also a strange phenomenon where if I add a few machines, rather than
> 3x the YCSB throughput when 3 nodes in cluster, each machine instead is doing
> about 1/3rd.
> This umbrella issue is to host items that improve our defaults and noting how
> to get good numbers running YCSB. In particular, I want to be able to
> saturate a machine.
> Here are the configs I'm currently working with. I've not done the work to
> figure client-side if they are optimal (weird is how big a difference
> client-side changes can make -- need to fix this). On my 48 cpu machine, I
> can do about 370k random reads a second from data totally cached in
> bucketcache. If I short-circuit the user gets so they don't do any work but
> return immediately, I can do 600k ops a second but the CPUs are at 60-70%
> only. I cannot get them to go above this. Working on it.
> {code}
> <property>
> <name>
> hbase.ipc.server.read.threadpool.size
> </name>
> <value>48</value>
> </property>
> <property>
> <name>
> hbase.regionserver.handler.count
> </name>
> <value>100</value>
> </property>
> <property>
> <name>
> hbase.client.ipc.pool.size
> </name>
> <value>100</value>
> </property>
> <property>
> <name>
> hbase.htable.threads.max
> </name>
> <value>48</value>
> </property>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)