Improving Client-Side Throughput Of HBase
-----------------------------------------

                 Key: HBASE-2939
                 URL: https://issues.apache.org/jira/browse/HBASE-2939
             Project: HBase
          Issue Type: Improvement
          Components: client
    Affects Versions: 0.89.20100621
            Reporter: Karthick Sankarachary


By design, the HBase RPC client multiplexes calls to a given region server (or 
the master for that matter) over a single socket, access to which is managed by 
a connection thread defined in the HBaseClient class. While this approach may 
suffice for most cases, it tends to break down in the context of a real-time, 
multi-threaded server, where latencies need to be lower and throughputs higher. 

In brief, the problem is that we dedicate one thread to handle all client-side 
reads and writes for a given server, which in turn forces them to share the 
same socket. As load increases, this is bound to serialize calls on the 
client-side. In particular, when the rate at which calls are submitted to the 
connection thread is greater than that at which the server responds, then some 
of those calls will inevitably end up sitting idle, just waiting their turn to 
go over the wire.

In general, sharing sockets across multiple client threads is a good idea, but 
limiting the number of such sockets to one may be overly restrictive for 
certain cases. Here, we propose a way of defining multiple sockets per server 
endpoint, access to which may be managed through either a load-balancing or 
thread-local pool. To that end, we define the notion of a SharedMap, which maps 
a key to a resource pool, and supports both of those pool types. Specifically, 
we will apply that map in the HBaseClient, to associate multiple connection 
threads with each server endpoint (denoted by a connection id). 

 Currently, the SharedMap supports the following types of pools:

    * A ThreadLocalPool, which represents a pool that builds on the ThreadLocal 
class. It essentially binds the resource to the thread from which it is 
accessed.
    * A ReusablePool, which represents a pool that builds on the LinkedList 
class. It essentially allows resources to be checked out, at which point it is 
(temporarily) removed from the pool. When the resource is no longer required, 
it should be returned to the pool in order to be reused.
    * A RoundRobinPool, which represents a pool that stores its resources in an 
ArrayList. It load-balances access to its resources by returning a different 
resource every time a given key is looked up.

To control the type and size of the connection pools, we give the user a couple 
of parameters (viz. "hbase.client.ipc.pool.type" and 
"hbase.client.ipc.pool.size"). In case the size of the pool is set to a 
non-zero positive number, that is used to cap the number of resources that a 
pool may contain for any given key. A size of Integer#MAX_VALUE is interpreted 
to mean an unbounded pool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to