[ 
https://issues.apache.org/jira/browse/HIVE-13527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-13527:
---------------------------------
    Status: Patch Available  (was: Open)

> Using deprecated APIs in HBase client causes zookeeper connection leaks.
> ------------------------------------------------------------------------
>
>                 Key: HIVE-13527
>                 URL: https://issues.apache.org/jira/browse/HIVE-13527
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 1.1.0
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>         Attachments: HIVE-13527.patch
>
>
> When running queries against hbase-backed hive tables, the following log 
> messages are seen in the HS2 log.
> {code}
> 2016-04-11 07:25:23,657 WARN 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: You are using an 
> HTable instance that relies on an HBase-managed Connection. This is usually 
> due to directly creating an HTable, which is deprecated. Instead, you should 
> create a Connection object and then request a Table instance from it. If you 
> don't need the Table instance for your own use, you should instead use the 
> TableInputFormatBase.initalizeTable method directly.
> 2016-04-11 07:25:23,658 INFO 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase: Creating an 
> additional unmanaged connection because user provided one can't be used for 
> administrative actions. We'll close it when we close out the table.
> {code}
> In a HS2 log file, there are 1366 zookeeper connections established but only 
> a small fraction of them were closed. So lsof would show 1300+ open TCP 
> connections to Zookeeper.
> grep "org.apache.zookeeper.ClientCnxn: Session establishment complete on 
> server" * |wc -l
> 1366
> grep "INFO org.apache.zookeeper.ZooKeeper: Session:" * |grep closed |wc -l
> 54
> According to the comments in TableInputFormatBase, the recommended means for 
> subclasses like HiveHBaseTableInputFormat is to call initializeTable() 
> instead of setHTable() that it currently uses.
> "
> Subclasses MUST ensure initializeTable(Connection, TableName) is called for 
> an instance to function properly. Each of the entry points to this class used 
> by the MapReduce framework, {@link #createRecordReader(InputSplit, 
> TaskAttemptContext)} and {@link #getSplits(JobContext)}, will call {@link 
> #initialize(JobContext)} as a convenient centralized location to handle 
> retrieving the necessary configuration information. If your subclass 
> overrides either of these methods, either call the parent version or call 
> initialize yourself.
> "
> Currently setHTable() also creates an additional Admin connection, even 
> though it is not needed.
> So the use of deprecated APIs are to be replaced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to