[ 
https://issues.apache.org/jira/browse/HADOOP-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566095#action_12566095
 ] 

Pi Song commented on HADOOP-2735:
---------------------------------

That might cause by big databags being spilled to the disk into the tmp dir.  

>Can we add -Djava.io.tmpdir="./tmp" somewhere ?

>so that, 

>1) Tasks can utilize all disks when using tmp

I don't understand this Koji. So your intention is just pointing the tmp 
directory to another physical disk right? If that is the case how setting 
java.io.tmpdir=./tmp will help?  Then you also have to set the mapred working 
dir to the new physical disk as well right?

Actually I think the way Pig handles big databags should be revisited. Spilling 
to local disk and read it back doesn't sound efficient to me. If the databag is 
really too big, why can't it just spill to HDFS to take advantage of disk 
parallelism?

> Setting default tmp directory for java createTempFile (java.io.tmpdir)
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-2735
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2735
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Koji Noguchi
>            Assignee: Amareshwari Sri Ramadasu
>            Priority: Critical
>             Fix For: 0.16.1
>
>         Attachments: patch-2735.txt
>
>
> On our cluster, we've seen Pig(http://incubator.apache.org/pig/) filling up 
> the /tmp and failing. 
> (also inefficient since all the local tasks were spilling to the  same disk)
> Pig is simply using java api createTempFile, 
> http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File
> Can we add -Djava.io.tmpdir="./tmp" somewhere ?
> so that, 
> 1) Tasks can utilize all disks when using tmp
> 2) Any undeleted tmp files will be deleted by the tasktracker when task(job?) 
> is done.
> The easiest way is to set it inside mapred.child.java.opts in the config, but 
> this can be overwritten if the users set their own task heapsize.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to