[ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13199488#comment-13199488 ]
Bryan Keller commented on HBASE-3792: ------------------------------------- The latest Cloudera code introduced the new reference counting connection management. There is a reference counter leak it appears in the HTable constructor, thus you'll see connection leaks again and my patch doesn't fix it. As a hack for now I force the connection to close by using reflection, setting the ref counter to 1, and calling close() on the connection. I do this after calling table.close() in TableInputFormat, TableRecordReader, and TableOutputFormat. I think I should log another bug, as the leak is not in the map reduce classes. > TableInputFormat leaks ZK connections > ------------------------------------- > > Key: HBASE-3792 > URL: https://issues.apache.org/jira/browse/HBASE-3792 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 0.90.1 > Environment: Java 1.6.0_24, Mac OS X 10.6.7 > Reporter: Bryan Keller > Attachments: patch0.90.4, tableinput.patch > > > The TableInputFormat creates an HTable using a new Configuration object, and > it never cleans it up. When running a Mapper, the TableInputFormat is > instantiated and the ZK connection is created. While this connection is not > explicitly cleaned up, the Mapper process eventually exits and thus the > connection is closed. Ideally the TableRecordReader would close the > connection in its close() method rather than relying on the process to die > for connection cleanup. This is fairly easy to implement by overriding > TableRecordReader, and also overriding TableInputFormat to specify the new > record reader. > The leak occurs when the JobClient is initializing and needs to retrieves the > splits. To get the splits, it instantiates a TableInputFormat. Doing so > creates a ZK connection that is never cleaned up. Unlike the mapper, however, > my job client process does not die. Thus the ZK connections accumulate. > I was able to fix the problem by writing my own TableInputFormat that does > not initialize the HTable in the getConf() method and does not have an HTable > member variable. Rather, it has a variable for the table name. The HTable is > instantiated where needed and then cleaned up. For example, in the > getSplits() method, I create the HTable, then close the connection once the > splits are retrieved. I also create the HTable when creating the record > reader, and I have a record reader that closes the connection when done. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira