Hi Harsh Thanks for your reply. While I don't quite catch what do you mean... Accroding to the description
<property> <name>keep.task.files.pattern</name> <value>.*_m_0000*</value> <description>Keep all files from tasks whose task names match the given regular expression. Defaults to none.</description> </property> Isn't that pattern for the task name? and the task name is something like : task_201208101126_0004_m_000000 ? So, shouldn't this patten make all the data from the tasks from been cleaned? If this don't work, can you kindly show me what's the exact pattern I should put here for the map->intermediate->reduce intermediate file (the merged partition file waiting to be shuffled to reduce tasks)? I tried ".out*" , it doesn't works too. Or I should modify some other property instead? Best Regards, Raymond Liu > -----Original Message----- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Friday, August 10, 2012 12:29 PM > To: common-user@hadoop.apache.org > Subject: Re: How can I get the intermediate output file from mapper class? > > Hi, > > You need the "file.out" and "file.out.index" files when wanting the > map->intermediate->reduce files. So try a pattern that matches these > and you should have it. > > The "XXXXX" kind of files are what MR produces on HDFS as regular outputs - > these aren't intermediate. > > On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <raymond....@intel.com> > wrote: > > Hi > > > > I am trying to access the intermediate file save to the local > filesystem from mapreduce's mapper output. > > > > I have googled this one : > > http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermedia > > te-output > > > > I am using hadoop 1.0.3 , and I did set following property in > > mapred-site.xml > > > > <property> > > <name>keep.task.files.pattern</name> > > <value>.*_m_00000*</value> > > </property> > > > > Then after restart hadoop and run some jobss, I did see tasks in my local > > dir > like: > > > > > /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > 208101040_0003/ > > > > But I still cannot find any output dir there. > > > > I have four disks mount for local dir, and only jars,work dir are find as > following: > > > > <property> > > <name>mapred.local.dir</name> > > > <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/ > ma > > > pred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/ma > pr > > ed</value> > > </property> > > > > Then I search though them: > > > > raymond@sr173:~$ ls > > > /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > 208101040_0003/ > > jars job.xml > > raymond@sr173:~$ ls > > > /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > 208101040_0003/ raymond@sr173:~$ ls > > > /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > 208101040_0003/ > > jobToken work > > raymond@sr173:~$ ls > > > /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > 208101040_0003/ > > > > And I also search the ttprivate dir, no luck there : > > > > raymond@sr173:~$ ls > > > /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac > > > he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas > kjvm. > > sh > > > /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac > > > he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas > kjvm. > > sh > > > > So, Is there anything I am still missing? > > > > > > Best Regards, > > Raymond Liu > > > > > > -- > Harsh J