[ 
https://issues.apache.org/jira/browse/HBASE-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190564#comment-14190564
 ] 

Nick Dimiduk commented on HBASE-9003:
-------------------------------------

IIRC, I introduced JarFinder for the purpose of launching jobs from the output 
committer of a running job. In this context, the dependency jars have been 
unpacked, so to launch the job, JarFinder is used to re-pack the class files 
into a jar.

Which raises an interesting point: you're seeing this accumulation of files 
under hadoop.tmp.dir even for regular jobs? Should be that nothing is created 
unless the requested class is found to exist on the class path outside of a jar.

I don't remember the details; let me look into the code when I have a few 
minutes.

Back to the point at hand +1 for fixing the accumulation problem.

> TableMapReduceUtil should not rely on org.apache.hadoop.util.JarFinder#getJar
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-9003
>                 URL: https://issues.apache.org/jira/browse/HBASE-9003
>             Project: HBase
>          Issue Type: Bug
>          Components: mapreduce
>            Reporter: Esteban Gutierrez
>            Assignee: Esteban Gutierrez
>             Fix For: 2.0.0, 0.99.2
>
>         Attachments: HBASE-9003.v0.patch, HBASE-9003.v1.patch, 
> HBASE-9003.v2.patch, HBASE-9003.v2.patch
>
>
> This is the problem: {{TableMapReduceUtil#addDependencyJars}} relies on 
> {{org.apache.hadoop.util.JarFinder}} if available to call {{getJar()}}. 
> However {{getJar()}} uses File.createTempFile() to create a temporary file 
> under {{hadoop.tmp.dir}}{{/target/test-dir}}. Due HADOOP-9737 the created jar 
> and its content is not purged after the JVM is destroyed. Since most 
> configurations point {{hadoop.tmp.dir}} under {{/tmp}} the generated jar 
> files get purged by {{tmpwatch}} or a similar tool, but boxes that have 
> {{hadoop.tmp.dir}} pointing to a different location not monitored by 
> {{tmpwatch}} will pile up a collection of jars causing all kind of issues. 
> Since {{JarFinder#getJar}} is not a public API from Hadoop (see [~tucu00] 
> comment on HADOOP-9737) we shouldn't use that as part of 
> {{TableMapReduceUtil}} in order to avoid this kind of issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to