You can enable the following options on the command line- pig -Dpig.delete.temp.files=false -Dpig.temp.dir=/foo
It will keep intermediate files in /foo on hdfs. These properties must be passed on the command line, or Main will delete all the temporary files at exit. To load temporary files, search for pig.reduce.output.dirs in job conf. The value will be something like /foo/temp610061619/tmp-1397105382. The file format is what you configure in your Pig job for temporary files such TFile, Sequence, etc. If you're using TFile, you can do the following- a = LOAD '/foo/temp610061619/tmp-1397105382' USING org.apache.pig.impl.io.TFileStorage(); DUMP a; This feature was introduced in 0.13. On Fri, Aug 1, 2014 at 12:26 AM, Keren Ouaknine <[email protected]> wrote: > Hello, > > I am running a Pig job, and would like to see the intermediate outputs > flowing out from the combiners into the reducers for this query. Is there a > flag I could activate (some kind of verbose or debug mode) that would > output these values? > > Thanks, > Keren > > -- > Keren Ouaknine > www.kereno.com >
