[jira] [Commented] (HBASE-16713) Bring back connection caching as a client API

Allan Yang (JIRA) Thu, 29 Jun 2017 19:51:11 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069388#comment-16069388
 ]


Allan Yang commented on HBASE-16713:
------------------------------------

Yes, please bring connection caching back. Currently, we have to use deprecated 
ConnectionManager.getConnection()  in branch-1. 
 We have similar cases like spark that many short living thread will access 
hbase, if connections are not shared, there may be too many zk connection 
concurrence. 


> Bring back connection caching as a client API
> ---------------------------------------------
>
>                 Key: HBASE-16713
>                 URL: https://issues.apache.org/jira/browse/HBASE-16713
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, spark
>            Reporter: Enis Soztutar
>             Fix For: 2.0.0
>
>
> Connection.getConnection() is removed in master for good reasons. The 
> connection lifecycle should always be explicit. We have replaced some of the 
> functionality with ConnectionCache for rest and thrift servers internally, 
> but it is not exposed to clients. 
> Turns out our friends doing the hbase-spark connector work needs a similar 
> connection caching behavior that we have in rest and thrift server. At a 
> higher level we want: 
>  - Spark executors should be able to run short living hbase tasks with low 
> latency 
>  - Short living tasks should be able to share the same connection, and should 
> not pay the price of instantiating the cluster connection (which means zk 
> connection, meta cache, 200+ threads, etc)
>  - Connections to the cluster should be closed if it is not used for some 
> time. Spark executors are used for other tasks as well. 
>  - Spark jobs may be launched with different configuration objects, possibly 
> connecting to different clusters between different jobs. 
>  - Although not a direct requirement for spark, different users should not 
> share the same connection object. 
> Looking at the old code that we have in branch-1 for {{ConnectionManager}}, 
> managed connections and the code in ConnectionCache, I think we should do a 
> first-class client level API called ConnectionCache which will be a hybrid 
> between ConnectionCache and old ConnectionManager. The lifecycle of the 
> ConnectionCache is still explicit, so I think API-design-wise, this will fit 
> into the current model. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-16713) Bring back connection caching as a client API

Reply via email to