[ 
https://issues.apache.org/jira/browse/HBASE-17009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17009:
--------------------------
    Fix Version/s: 2.0.0

> Revisiting the removement of managed connection and connection caching
> ----------------------------------------------------------------------
>
>                 Key: HBASE-17009
>                 URL: https://issues.apache.org/jira/browse/HBASE-17009
>             Project: HBase
>          Issue Type: Task
>          Components: Operability
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> In HBASE-13197 we have done lots of good cleanups for Connection API, but 
> among which HBASE-13252 dropped the feature of managed connection and 
> connection caching, and this JIRA propose to have a revisit on this decision 
> for below reasons.
> Assume we have a long running process with multiple threads accessing HBase 
> (a common case for streaming application), let's see what happens previously 
> and now.
> Previously:
> User could create an HTable instance whenever they want w/o worrying about 
> the underlying connections because HBase client will mange it automatically, 
> say no matter how many threads there will be only one Connection instance
> {code}
>   @Deprecated
>   public HTable(Configuration conf, final TableName tableName)
>   throws IOException {
>     ...
>     this.connection = ConnectionManager.getConnectionInternal(conf);
>     ...
>   }
>   static ClusterConnection getConnectionInternal(final Configuration conf)
>     throws IOException {
>     HConnectionKey connectionKey = new HConnectionKey(conf);
>     synchronized (CONNECTION_INSTANCES) {
>       HConnectionImplementation connection = 
> CONNECTION_INSTANCES.get(connectionKey);
>       if (connection == null) {
>         connection = (HConnectionImplementation)createConnection(conf, true);
>         CONNECTION_INSTANCES.put(connectionKey, connection);
>       } else if (connection.isClosed()) {
>         ConnectionManager.deleteConnection(connectionKey, true);
>         connection = (HConnectionImplementation)createConnection(conf, true);
>         CONNECTION_INSTANCES.put(connectionKey, connection);
>       }
>       connection.incCount();
>       return connection;
>     }
>   }
> {code}
> Now:
> User has to create the connection by themselves, using below codes like 
> indicated in our recommendations
> {code}
>     Connection connection = ConnectionFactory.createConnection(conf);
>     Table table = connection.getTable(tableName);
> {code}
> And they must make sure *only one* single connection created in one *process* 
> instead of creating HTable instance freely, or else there might be many 
> connections setup to zookeeper/RS with multiple threads. Also user might ask 
> "when I should close the connection I close?" and the answer is "make sure 
> don't close it until the *process* shutdown"
> So now we have much more things for user to "Make sure", but custom is 
> something hard to change. User used to create table instance in each thread 
> (according to which table to access per requested) so probably they will 
> still create connections everywhere, and then operators will have to crazily 
> resolve all kinds of problems...
> So I'm proposing to add back the managed connection and connection caching 
> support. IMHO it's something good and ever existed in our implementation, so 
> let's bring it back and save the workload for operators when they decided to 
> upgrade from 1.x to 2.x
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to