Naveen Gangam created HIVE-13527:
------------------------------------
Summary: Using deprecated APIs in HBase client causes zookeeper
connection leaks.
Key: HIVE-13527
URL: https://issues.apache.org/jira/browse/HIVE-13527
Project: Hive
Issue Type: Bug
Components: HiveServer2
Affects Versions: 1.1.0
Reporter: Naveen Gangam
Assignee: Naveen Gangam
When running queries against hbase-backed hive tables, the following log
messages are seen in the HS2 log.
{code}
2016-04-11 07:25:23,657 WARN
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: You are using an HTable
instance that relies on an HBase-managed Connection. This is usually due to
directly creating an HTable, which is deprecated. Instead, you should create a
Connection object and then request a Table instance from it. If you don't need
the Table instance for your own use, you should instead use the
TableInputFormatBase.initalizeTable method directly.
2016-04-11 07:25:23,658 INFO
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Creating an additional
unmanaged connection because user provided one can't be used for administrative
actions. We'll close it when we close out the table.
{code}
In a HS2 log file, there are 1366 zookeeper connections established but only a
small fraction of them were closed. So lsof would show 1300+ open TCP
connections to Zookeeper.
grep "org.apache.zookeeper.ClientCnxn: Session establishment complete on
server" * |wc -l
1366
grep "INFO org.apache.zookeeper.ZooKeeper: Session:" * |grep closed |wc -l
54
According to the comments in TableInputFormatBase, the recommended means for
subclasses like HiveHBaseTableInputFormat is to call initializeTable() instead
of setHTable() that it currently uses.
"
Subclasses MUST ensure initializeTable(Connection, TableName) is called for an
instance to function properly. Each of the entry points to this class used by
the MapReduce framework, {@link #createRecordReader(InputSplit,
TaskAttemptContext)} and {@link #getSplits(JobContext)}, will call {@link
#initialize(JobContext)} as a convenient centralized location to handle
retrieving the necessary configuration information. If your subclass overrides
either of these methods, either call the parent version or call initialize
yourself.
"
Currently setHTable() also creates an additional Admin connection, even though
it is not needed.
So the use of deprecated APIs are to be replaced.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)