Yu Li created HBASE-17009:
-----------------------------
Summary: Revisiting the removing of managed connection and
connection caching
Key: HBASE-17009
URL: https://issues.apache.org/jira/browse/HBASE-17009
Project: HBase
Issue Type: Task
Reporter: Yu Li
Assignee: Yu Li
In HBASE-13197 we have done lots of good cleanups for Connection API, but among
which HBASE-13252 dropped the feature of managed connection and connection
caching, and this JIRA propose to have a revisit on this decision for below
reasons.
Assume we have a long running process with multiple threads accessing HBase (a
common case for streaming application), let's see what happens previously and
now.
Previously:
User could create an HTable instance whenever they want w/o worrying about the
underlying connections because HBase client will mange it automatically, say no
matter how many threads there will be only one Connection instance
{code}
@Deprecated
public HTable(Configuration conf, final TableName tableName)
throws IOException {
...
this.connection = ConnectionManager.getConnectionInternal(conf);
...
}
static ClusterConnection getConnectionInternal(final Configuration conf)
throws IOException {
HConnectionKey connectionKey = new HConnectionKey(conf);
synchronized (CONNECTION_INSTANCES) {
HConnectionImplementation connection =
CONNECTION_INSTANCES.get(connectionKey);
if (connection == null) {
connection = (HConnectionImplementation)createConnection(conf, true);
CONNECTION_INSTANCES.put(connectionKey, connection);
} else if (connection.isClosed()) {
ConnectionManager.deleteConnection(connectionKey, true);
connection = (HConnectionImplementation)createConnection(conf, true);
CONNECTION_INSTANCES.put(connectionKey, connection);
}
connection.incCount();
return connection;
}
}
{code}
Now:
User has to create the connection by themselves, using below codes like
indicated in our recommendations
{code}
Connection connection = ConnectionFactory.createConnection(conf);
Table table = connection.getTable(tableName);
{code}
And they must make sure *only one* single connection created in one *process*
instead of creating HTable instance freely, or else there might be many
connections setup to zookeeper/RS with multiple threads. Also user might ask
"when I should close the connection I close?" and the answer is "make sure
don't close it until the *process* shutdown"
So now we have much more things for user to "Make sure", but custom is
something hard to change. User used to create table instance in each thread
(according to which table to access per requested) so probably they will still
create connections everywhere, and then operators will have to crazily resolve
all kinds of problems...
So I'm proposing to add back the managed connection and connection caching
support. IMHO it's something good and ever existed in our implementation, so
let's bring it back and save the workload for operators when they decided to
upgrade from 1.x to 2.x
Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)