Pig puts results between MR jobs into HDFS. Results from maps go into local files (like any other MR job).

For results between MR jobs, you want them in HDFS where they will get replicated. Else your next MR job will not have a sufficient number of places it could be run, and you're much more likely to be pulling data across the network.

Alan.

On Sep 13, 2010, at 3:15 PM, jiang licht wrote:

All these settings should point to non-dfs folders. But I saw some pig jobs save intermediate outputs to "/tmp" in HDFS (maybe just in "interactive mode", not sure if I remember correctly, will check this), which means they get replicated and use much more space.

Thanks,

Michael

--- On Mon, 9/13/10, Mr. Jan Walter <hopping_...@yahoo.com> wrote:

From: Mr. Jan Walter <hopping_...@yahoo.com>
Subject: Re: specify temp folder?
To: pig-user@hadoop.apache.org
Date: Monday, September 13, 2010, 5:01 PM

Set the following parameter in your workers' mapred-site.xml, and change the
value to what you want:

<property>
  <name>mapred.child.tmp</name>
  <value>/tmp</value>
<description> To set the value of tmp directory for map and reduce tasks. If the value is an absolute path, it is directly assigned. Otherwise, it is prepended with task's working directory. The java tasks are executed with option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and
  streaming are set with environment variable,
   TMPDIR='the absolute path of the tmp dir'
  </description>
</property>


In core-site.xml, set the hadoop.tmp.dir property the same way as above. I am
not sure how they all interrelate.


There is also a tmpdir variable for the JVM, I am not sure what reads that. I
just set them all the same.



----- Original Message ----
From: jiang licht <licht_ji...@yahoo.com>
To: pig-user@hadoop.apache.org
Sent: Mon, September 13, 2010 5:23:12 PM
Subject: specify temp folder?

It seems that pig generates some folders/files under "/tmp" in HDFS for pig jobs. I remember that hadoop saves such intermediate results (map output, etc.) in non-hdfs folders, which are specified in mapred-site.xml. So, is there a way
to tell pig to store such data to a non-hdfs folder?

Thanks,

Michael










Reply via email to