Hi Harsh

        Thanks for your reply. While I don't quite catch what do you mean... 
Accroding to the description

<property>
  <name>keep.task.files.pattern</name>
  <value>.*_m_0000*</value>
  <description>Keep all files from tasks whose task names match the given
               regular expression. Defaults to none.</description>
</property>


        Isn't that pattern for the task name? and the task name is something 
like : task_201208101126_0004_m_000000 ? So, shouldn't this patten make all the 
data from the tasks from been cleaned?

        If this don't work, can you kindly show me what's the exact pattern I 
should put here for the map->intermediate->reduce intermediate file (the merged 
partition file waiting to be shuffled to reduce tasks)? I tried ".out*" , it 
doesn't works too.

Or I should modify some other property instead?


Best Regards,
Raymond Liu

> -----Original Message-----
> From: Harsh J [mailto:ha...@cloudera.com]
> Sent: Friday, August 10, 2012 12:29 PM
> To: common-user@hadoop.apache.org
> Subject: Re: How can I get the intermediate output file from mapper class?
> 
> Hi,
> 
> You need the "file.out" and "file.out.index" files when wanting the
> map->intermediate->reduce files. So try a pattern that matches these
> and you should have it.
> 
> The "XXXXX" kind of files are what MR produces on HDFS as regular outputs -
> these aren't intermediate.
> 
> On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <raymond....@intel.com>
> wrote:
> > Hi
> >
> >         I am trying to access the intermediate file save to the local
> filesystem from mapreduce's mapper output.
> >
> >         I have googled this one :
> > http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermedia
> > te-output
> >
> >         I am using hadoop 1.0.3 , and I did set following property in
> > mapred-site.xml
> >
> > <property>
> >   <name>keep.task.files.pattern</name>
> >   <value>.*_m_00000*</value>
> > </property>
> >
> > Then after restart hadoop and run some jobss, I did see tasks in my local 
> > dir
> like:
> >
> >
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/
> >
> > But I still cannot find any output dir there.
> >
> > I have four disks mount for local dir, and only jars,work dir are find as
> following:
> >
> > <property>
> > <name>mapred.local.dir</name>
> >
> <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/
> ma
> >
> pred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/ma
> pr
> > ed</value>
> > </property>
> >
> > Then I search though them:
> >
> > raymond@sr173:~$ ls
> >
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/
> > jars  job.xml
> > raymond@sr173:~$ ls
> >
> /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/ raymond@sr173:~$ ls
> >
> /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/
> > jobToken  work
> > raymond@sr173:~$ ls
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > 208101040_0003/
> >
> > And I also search the ttprivate dir, no luck there :
> >
> > raymond@sr173:~$ ls
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac
> >
> he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas
> kjvm.
> > sh
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac
> >
> he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas
> kjvm.
> > sh
> >
> > So, Is there anything I am still missing?
> >
> >
> > Best Regards,
> > Raymond Liu
> >
> 
> 
> 
> --
> Harsh J

Reply via email to