[ 
https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Keller updated HBASE-3792:
--------------------------------

    Attachment: tableinput.patch

Here's a patch demonstrating the changes I have implemented in my system, as 
described above. The patch is for trunk, so the changes are slightly different 
than what I am using for 0.90.4.
                
> TableInputFormat leaks ZK connections
> -------------------------------------
>
>                 Key: HBASE-3792
>                 URL: https://issues.apache.org/jira/browse/HBASE-3792
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>    Affects Versions: 0.90.1
>         Environment: Java 1.6.0_24, Mac OS X 10.6.7
>            Reporter: Bryan Keller
>         Attachments: tableinput.patch
>
>
> The TableInputFormat creates an HTable using a new Configuration object, and 
> it never cleans it up. When running a Mapper, the TableInputFormat is 
> instantiated and the ZK connection is created. While this connection is not 
> explicitly cleaned up, the Mapper process eventually exits and thus the 
> connection is closed. Ideally the TableRecordReader would close the 
> connection in its close() method rather than relying on the process to die 
> for connection cleanup. This is fairly easy to implement by overriding 
> TableRecordReader, and also overriding TableInputFormat to specify the new 
> record reader.
> The leak occurs when the JobClient is initializing and needs to retrieves the 
> splits. To get the splits, it instantiates a TableInputFormat. Doing so 
> creates a ZK connection that is never cleaned up. Unlike the mapper, however, 
> my job client process does not die. Thus the ZK connections accumulate.
> I was able to fix the problem by writing my own TableInputFormat that does 
> not initialize the HTable in the getConf() method and does not have an HTable 
> member variable. Rather, it has a variable for the table name. The HTable is 
> instantiated where needed and then cleaned up. For example, in the 
> getSplits() method, I create the HTable, then close the connection once the 
> splits are retrieved. I also create the HTable when creating the record 
> reader, and I have a record reader that closes the connection when done.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to