Pig puts results between MR jobs into HDFS. Results from maps go into
local files (like any other MR job).
For results between MR jobs, you want them in HDFS where they will get
replicated. Else your next MR job will not have a sufficient number
of places it could be run, and you're much more likely to be pulling
data across the network.
Alan.
On Sep 13, 2010, at 3:15 PM, jiang licht wrote:
All these settings should point to non-dfs folders. But I saw some
pig jobs save intermediate outputs to "/tmp" in HDFS (maybe just in
"interactive mode", not sure if I remember correctly, will check
this), which means they get replicated and use much more space.
Thanks,
Michael
--- On Mon, 9/13/10, Mr. Jan Walter <hopping_...@yahoo.com> wrote:
From: Mr. Jan Walter <hopping_...@yahoo.com>
Subject: Re: specify temp folder?
To: pig-user@hadoop.apache.org
Date: Monday, September 13, 2010, 5:01 PM
Set the following parameter in your workers' mapred-site.xml, and
change the
value to what you want:
<property>
<name>mapred.child.tmp</name>
<value>/tmp</value>
<description> To set the value of tmp directory for map and reduce
tasks.
If the value is an absolute path, it is directly assigned.
Otherwise, it is
prepended with task's working directory. The java tasks are
executed with
option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes
and
streaming are set with environment variable,
TMPDIR='the absolute path of the tmp dir'
</description>
</property>
In core-site.xml, set the hadoop.tmp.dir property the same way as
above. I am
not sure how they all interrelate.
There is also a tmpdir variable for the JVM, I am not sure what
reads that. I
just set them all the same.
----- Original Message ----
From: jiang licht <licht_ji...@yahoo.com>
To: pig-user@hadoop.apache.org
Sent: Mon, September 13, 2010 5:23:12 PM
Subject: specify temp folder?
It seems that pig generates some folders/files under "/tmp" in HDFS
for pig
jobs. I remember that hadoop saves such intermediate results (map
output, etc.)
in non-hdfs folders, which are specified in mapred-site.xml. So, is
there a way
to tell pig to store such data to a non-hdfs folder?
Thanks,
Michael