[
https://issues.apache.org/jira/browse/HBASE-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190564#comment-14190564
]
Nick Dimiduk commented on HBASE-9003:
-------------------------------------
IIRC, I introduced JarFinder for the purpose of launching jobs from the output
committer of a running job. In this context, the dependency jars have been
unpacked, so to launch the job, JarFinder is used to re-pack the class files
into a jar.
Which raises an interesting point: you're seeing this accumulation of files
under hadoop.tmp.dir even for regular jobs? Should be that nothing is created
unless the requested class is found to exist on the class path outside of a jar.
I don't remember the details; let me look into the code when I have a few
minutes.
Back to the point at hand +1 for fixing the accumulation problem.
> TableMapReduceUtil should not rely on org.apache.hadoop.util.JarFinder#getJar
> -----------------------------------------------------------------------------
>
> Key: HBASE-9003
> URL: https://issues.apache.org/jira/browse/HBASE-9003
> Project: HBase
> Issue Type: Bug
> Components: mapreduce
> Reporter: Esteban Gutierrez
> Assignee: Esteban Gutierrez
> Fix For: 2.0.0, 0.99.2
>
> Attachments: HBASE-9003.v0.patch, HBASE-9003.v1.patch,
> HBASE-9003.v2.patch, HBASE-9003.v2.patch
>
>
> This is the problem: {{TableMapReduceUtil#addDependencyJars}} relies on
> {{org.apache.hadoop.util.JarFinder}} if available to call {{getJar()}}.
> However {{getJar()}} uses File.createTempFile() to create a temporary file
> under {{hadoop.tmp.dir}}{{/target/test-dir}}. Due HADOOP-9737 the created jar
> and its content is not purged after the JVM is destroyed. Since most
> configurations point {{hadoop.tmp.dir}} under {{/tmp}} the generated jar
> files get purged by {{tmpwatch}} or a similar tool, but boxes that have
> {{hadoop.tmp.dir}} pointing to a different location not monitored by
> {{tmpwatch}} will pile up a collection of jars causing all kind of issues.
> Since {{JarFinder#getJar}} is not a public API from Hadoop (see [~tucu00]
> comment on HADOOP-9737) we shouldn't use that as part of
> {{TableMapReduceUtil}} in order to avoid this kind of issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)