Enis Soztutar created HBASE-16713:
-------------------------------------

             Summary: Bring back connection caching as a client API
                 Key: HBASE-16713
                 URL: https://issues.apache.org/jira/browse/HBASE-16713
             Project: HBase
          Issue Type: New Feature
          Components: Client
            Reporter: Enis Soztutar
             Fix For: 2.0.0, 1.4.0


Connection.getConnection() is removed in master for good reasons. The 
connection lifecycle should always be explicit. We have replaced some of the 
functionality with ConnectionCache for rest and thrift servers internally, but 
it is not exposed to clients. 

Turns out our friends doing the hbase-spark connector work needs a similar 
connection caching behavior that we have in rest and thrift server. At a higher 
level we want: 
 - Spark executors should be able to run short living hbase tasks with low 
latency 
 - Short living tasks should be able to share the same connection, and should 
not pay the price of instantiating the cluster connection (which means zk 
connection, meta cache, 200+ threads, etc)
 - Connections to the cluster should be closed if it is not used for some time. 
Spark executors are used for other tasks as well. 
 - Spark jobs may be launched with different configuration objects, possibly 
connecting to different clusters between different jobs. 
 - Although not a direct requirement for spark, different users should not 
share the same connection object. 

Looking at the old code that we have in branch-1 for {{ConnectionManager}}, 
managed connections and the code in ConnectionCache, I think we should do a 
first-class client level API called ConnectionCache which will be a hybrid 
between ConnectionCache and old ConnectionManager. The lifecycle of the 
ConnectionCache is still explicit, so I think API-design-wise, this will fit 
into the current model. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to