Naveen Gangam created HIVE-13527:
------------------------------------

             Summary: Using deprecated APIs in HBase client causes zookeeper 
connection leaks.
                 Key: HIVE-13527
                 URL: https://issues.apache.org/jira/browse/HIVE-13527
             Project: Hive
          Issue Type: Bug
          Components: HiveServer2
    Affects Versions: 1.1.0
            Reporter: Naveen Gangam
            Assignee: Naveen Gangam


When running queries against hbase-backed hive tables, the following log 
messages are seen in the HS2 log.
{code}
2016-04-11 07:25:23,657 WARN 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: You are using an HTable 
instance that relies on an HBase-managed Connection. This is usually due to 
directly creating an HTable, which is deprecated. Instead, you should create a 
Connection object and then request a Table instance from it. If you don't need 
the Table instance for your own use, you should instead use the 
TableInputFormatBase.initalizeTable method directly.
2016-04-11 07:25:23,658 INFO 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Creating an additional 
unmanaged connection because user provided one can't be used for administrative 
actions. We'll close it when we close out the table.
{code}

In a HS2 log file, there are 1366 zookeeper connections established but only a 
small fraction of them were closed. So lsof would show 1300+ open TCP 
connections to Zookeeper.
grep "org.apache.zookeeper.ClientCnxn: Session establishment complete on 
server" * |wc -l
1366
grep "INFO org.apache.zookeeper.ZooKeeper: Session:" * |grep closed |wc -l
54

According to the comments in TableInputFormatBase, the recommended means for 
subclasses like HiveHBaseTableInputFormat is to call initializeTable() instead 
of setHTable() that it currently uses.
"
Subclasses MUST ensure initializeTable(Connection, TableName) is called for an 
instance to function properly. Each of the entry points to this class used by 
the MapReduce framework, {@link #createRecordReader(InputSplit, 
TaskAttemptContext)} and {@link #getSplits(JobContext)}, will call {@link 
#initialize(JobContext)} as a convenient centralized location to handle 
retrieving the necessary configuration information. If your subclass overrides 
either of these methods, either call the parent version or call initialize 
yourself.
"

Currently setHTable() also creates an additional Admin connection, even though 
it is not needed.

So the use of deprecated APIs are to be replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to