[
https://issues.apache.org/jira/browse/HADOOP-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566245#action_12566245
]
Koji Noguchi commented on HADOOP-2735:
--------------------------------------
> Then you also have to set the mapred working dir to the new physical disk as
> well right?
>
I believe each task is assigned with its own working directory utilizing all
the disks already.
> Actually I think the way Pig handles big databags should be revisited.
>
I'm not qualified to comment about Pig. Pig just happened to be using
File.createTempFile.
I could have asked Pig to add 'java.io.tmpdir=./tmp', but any application can
be using tmpdir.
> Shall we create tmp directory in task's working directory and make
> java.io.tmpdir=./tmp always?
>
It would work for our setting.
However, I don't know how tmp is used outside. Maybe people use it to share
data among tasks. (not likely but can happen)
> Setting default tmp directory for java createTempFile (java.io.tmpdir)
> ----------------------------------------------------------------------
>
> Key: HADOOP-2735
> URL: https://issues.apache.org/jira/browse/HADOOP-2735
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Koji Noguchi
> Assignee: Amareshwari Sri Ramadasu
> Priority: Critical
> Fix For: 0.16.1
>
> Attachments: patch-2735.txt
>
>
> On our cluster, we've seen Pig(http://incubator.apache.org/pig/) filling up
> the /tmp and failing.
> (also inefficient since all the local tasks were spilling to the same disk)
> Pig is simply using java api createTempFile,
> http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File
> Can we add -Djava.io.tmpdir="./tmp" somewhere ?
> so that,
> 1) Tasks can utilize all disks when using tmp
> 2) Any undeleted tmp files will be deleted by the tasktracker when task(job?)
> is done.
> The easiest way is to set it inside mapred.child.java.opts in the config, but
> this can be overwritten if the users set their own task heapsize.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.