Re: Separating mapper intermediate files

Raj Vishwanathan Tue, 27 Mar 2012 12:22:17 -0700

Aayush

You can use the following. Just play around with the pattern


 <property>
  <name>keep.task.files.pattern</name>
  <value>.*_m_123456_0</value>
  <description>Keep all files from tasks whose task names match the given
               regular expression. Defaults to none.</description>
  </property>


Raj



>________________________________
> From: aayush <aayushgupta...@gmail.com>
>To: common-user@hadoop.apache.org 
>Sent: Tuesday, March 27, 2012 5:18 AM
>Subject: Re: Separating mapper intermediate files
> 
>Thanks Harsh.
>
>I set the mapred.local.dir as you suggested. It creates 4 folders in it for 
>jobtracker, tasktracker, tt_private etc. i could not see an attempt directory. 
>Can you let me know exactly where to look in this directory structure?
>
>Furthermore, it seems that all the intermediate spill and map output are 
>cleaned up when the mapper finishes. I want to see those intermediate files 
>and  don't want the cleanup of these files. How can I achieve it?
>
>Thanks a lot
>
>On Mar 27, 2012, at 1:16 AM, "Harsh J-2 [via Hadoop 
>Common]"<ml-node+s472056n3860389...@n3.nabble.com> wrote:
>
>> Hello Aayush, 
>> 
>> Three things that'd help clear your confusion: 
>> 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this 
>> to a partition1 path. 
>> 2. mapred.local.dir controls where intermediate task data go to. Set 
>> this to a partition2 path. 
>> 
>> > Furthermore, can someone also tell me how to save intermediate mapper 
>> > files(spill outputs) and where are they saved. 
>> 
>> Intermediate outputs are handled by the framework itself (There is no 
>> user/manual work involved), and are saved inside attempt directories 
>> under mapred.local.dir. 
>> 
>> On Tue, Mar 27, 2012 at 4:46 AM, aayush <[hidden email]> wrote: 
>> > I am a newbie to Hadoop and map reduce. I am running a single node hadoop 
>> > setup. I have created 2 partitions on my HDD. I want the mapper 
>> > intermediate 
>> > files (i.e. the spill files and the mapper output) to be sent to a file 
>> > system on Partition1 whereas everything else including HDFS should be run 
>> > on 
>> > partition2. I am struggling to find the appropriate parametes in the conf 
>> > files. I understand that there is hadoop.tmp.dir and mapred.local.dir but 
>> > am 
>> > not sure how to use what. I would really appreciate if someone could tell 
>> > me 
>> > exactly which parameters to modify to achieve the goal. 
>> 
>> -- 
>> Harsh J 
>> 
>> 
>> If you reply to this email, your message will be added to the discussion 
>> below:
>> http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3860389.html
>> To unsubscribe from Separating mapper intermediate files, click here.
>> NAML
>
>
>--
>View this message in context: 
>http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3861159.html
>Sent from the Users mailing list archive at Nabble.com.
>
>

Re: Separating mapper intermediate files

Reply via email to