[
https://issues.apache.org/jira/browse/HBASE-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000661#comment-13000661
]
ryan rawson commented on HBASE-2939:
------------------------------------
I ran this on a little cluster test and my results were a little mixed. This
is due to my set up and my benchmarking setup.
- using YCSB and a small working set I loaded data on Regionserver R1
- ran YCSB with small 300 row scans with 100 columns on Master M
- ran with both this patch, and without.
The test runs with this patch did not indicate significant improvement. I
think this was due to the fact that I was saturating the network between M and
R1, and opening more sockets gave slight but not significant improvement. I
didn't capture the numbers, but it was something like 390 ms without and 340 ms
with.
I ran in a fairly interesting case since I was trying to test the contention of
the single socket, and it seems like adding more sockets to a saturated network
did not improve like I had hoped to.
Could you paste in your test scenario? I could see that for some scenarios
involving small to medium gets, that this could provide an improvement to
latency.
> Allow Client-Side Connection Pooling
> ------------------------------------
>
> Key: HBASE-2939
> URL: https://issues.apache.org/jira/browse/HBASE-2939
> Project: HBase
> Issue Type: Improvement
> Components: client
> Affects Versions: 0.89.20100621
> Reporter: Karthick Sankarachary
> Assignee: ryan rawson
> Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-2939-0.20.6.patch, HBASE-2939.patch,
> HBASE-2939.patch
>
>
> By design, the HBase RPC client multiplexes calls to a given region server
> (or the master for that matter) over a single socket, access to which is
> managed by a connection thread defined in the HBaseClient class. While this
> approach may suffice for most cases, it tends to break down in the context of
> a real-time, multi-threaded server, where latencies need to be lower and
> throughputs higher.
> In brief, the problem is that we dedicate one thread to handle all
> client-side reads and writes for a given server, which in turn forces them to
> share the same socket. As load increases, this is bound to serialize calls on
> the client-side. In particular, when the rate at which calls are submitted to
> the connection thread is greater than that at which the server responds, then
> some of those calls will inevitably end up sitting idle, just waiting their
> turn to go over the wire.
> In general, sharing sockets across multiple client threads is a good idea,
> but limiting the number of such sockets to one may be overly restrictive for
> certain cases. Here, we propose a way of defining multiple sockets per server
> endpoint, access to which may be managed through either a load-balancing or
> thread-local pool. To that end, we define the notion of a SharedMap, which
> maps a key to a resource pool, and supports both of those pool types.
> Specifically, we will apply that map in the HBaseClient, to associate
> multiple connection threads with each server endpoint (denoted by a
> connection id).
> Currently, the SharedMap supports the following types of pools:
> * A ThreadLocalPool, which represents a pool that builds on the
> ThreadLocal class. It essentially binds the resource to the thread from which
> it is accessed.
> * A ReusablePool, which represents a pool that builds on the LinkedList
> class. It essentially allows resources to be checked out, at which point it
> is (temporarily) removed from the pool. When the resource is no longer
> required, it should be returned to the pool in order to be reused.
> * A RoundRobinPool, which represents a pool that stores its resources in
> an ArrayList. It load-balances access to its resources by returning a
> different resource every time a given key is looked up.
> To control the type and size of the connection pools, we give the user a
> couple of parameters (viz. "hbase.client.ipc.pool.type" and
> "hbase.client.ipc.pool.size"). In case the size of the pool is set to a
> non-zero positive number, that is used to cap the number of resources that a
> pool may contain for any given key. A size of Integer#MAX_VALUE is
> interpreted to mean an unbounded pool.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira