JobClient.runJob leaks file descriptors
---------------------------------------

                 Key: MAPREDUCE-821
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-821
             Project: Hadoop Map/Reduce
          Issue Type: Bug
         Environment: Driver running on Ubuntu Jaunty x86, cluster running a 
Linux variant. 
            Reporter: Mark Desnoyer
            Priority: Critical


In a Java-based driver that runs multiple MapReduce jobs (e.g. Mahout's K-means 
implementation), numerous calls to JobClient.runJob will cause many RPC 
connections to be opened and then never closed. This results in the driver job 
leaking file descriptors and will eventually crash once the OS limit is reached 
for Too Many Open Files.

This has been verified in Hadoop 18.3 by running the driver and as new 
MapReduce jobs are run, lsof -p dhows an increasing number of open TCP 
connections to the cluster.

Looking at the current code in the trunk, it looks like this is caused by 
runJob not calling close() on the JobClient object it creates. Or 
alternatively, it's cause by the fact that JobClient does not have a destructor 
that calls close().

I am going to verify this hypothesis and post a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to