The temp files between MR jobs are stored on dfs. This has to be on dfs as 
these are inputs to the next MR job.
-Thejas


On 9/13/10 3:15 PM, "jiang licht" <licht_ji...@yahoo.com> wrote:

All these settings should point to non-dfs folders. But I saw some pig jobs 
save intermediate outputs to "/tmp" in HDFS (maybe just in "interactive mode", 
not sure if I remember correctly, will check this), which means they get 
replicated and use much more space.

Thanks,

Michael

--- On Mon, 9/13/10, Mr. Jan Walter <hopping_...@yahoo.com> wrote:

From: Mr. Jan Walter <hopping_...@yahoo.com>
Subject: Re: specify temp folder?
To: pig-user@hadoop.apache.org
Date: Monday, September 13, 2010, 5:01 PM

Set the following parameter in your workers' mapred-site.xml, and change the
value to what you want:

<property>
  <name>mapred.child.tmp</name>
  <value>/tmp</value>
  <description> To set the value of tmp directory for map and reduce tasks.
  If the value is an absolute path, it is directly assigned. Otherwise, it is
  prepended with task's working directory. The java tasks are executed with
  option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and
  streaming are set with environment variable,
   TMPDIR='the absolute path of the tmp dir'
  </description>
</property>


In core-site.xml, set the hadoop.tmp.dir property the same way as above. I am
not sure how they all interrelate.


There is also a tmpdir variable for the JVM, I am not sure what reads that. I
just set them all the same.



----- Original Message ----
> From: jiang licht <licht_ji...@yahoo.com>
> To: pig-user@hadoop.apache.org
> Sent: Mon, September 13, 2010 5:23:12 PM
> Subject: specify temp folder?
>
> It seems that pig generates some folders/files under "/tmp" in HDFS for pig
>jobs. I remember that hadoop saves such intermediate results (map output, etc.)
>in non-hdfs folders, which are specified in mapred-site.xml. So, is there a way
>to tell pig to store such data to a non-hdfs folder?
>
> Thanks,
>
> Michael
>
>
>









Reply via email to