[ 
https://issues.apache.org/jira/browse/HBASE-16713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070491#comment-16070491
 ] 

Enis Soztutar commented on HBASE-16713:
---------------------------------------

This is more generic than the hbase-spark module. There seems to be valid uses 
cases that depends on connection caching, and we are basically regressing in 
that front. Current users of cached connections do not have any alternative 
than implementing their own hacked-up stuff. It is better that we provide them 
with a proper implementation. 

I think I have a wip patch somewhere. Let me attach it here as a starter. 



> Bring back connection caching as a client API
> ---------------------------------------------
>
>                 Key: HBASE-16713
>                 URL: https://issues.apache.org/jira/browse/HBASE-16713
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, spark
>            Reporter: Enis Soztutar
>             Fix For: 2.0.0
>
>
> Connection.getConnection() is removed in master for good reasons. The 
> connection lifecycle should always be explicit. We have replaced some of the 
> functionality with ConnectionCache for rest and thrift servers internally, 
> but it is not exposed to clients. 
> Turns out our friends doing the hbase-spark connector work needs a similar 
> connection caching behavior that we have in rest and thrift server. At a 
> higher level we want: 
>  - Spark executors should be able to run short living hbase tasks with low 
> latency 
>  - Short living tasks should be able to share the same connection, and should 
> not pay the price of instantiating the cluster connection (which means zk 
> connection, meta cache, 200+ threads, etc)
>  - Connections to the cluster should be closed if it is not used for some 
> time. Spark executors are used for other tasks as well. 
>  - Spark jobs may be launched with different configuration objects, possibly 
> connecting to different clusters between different jobs. 
>  - Although not a direct requirement for spark, different users should not 
> share the same connection object. 
> Looking at the old code that we have in branch-1 for {{ConnectionManager}}, 
> managed connections and the code in ConnectionCache, I think we should do a 
> first-class client level API called ConnectionCache which will be a hybrid 
> between ConnectionCache and old ConnectionManager. The lifecycle of the 
> ConnectionCache is still explicit, so I think API-design-wise, this will fit 
> into the current model. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to