Re: Num map task?

nguyenhuynh.mr Thu, 23 Apr 2009 19:10:43 -0700

Edward J. Yoon wrote:

> As far as I know, FileInputFormat.getSplits() will returns the number
> of splits automatically computed by the number of files, blocks. BTW,
> What version of Hadoop/Hbase?
>
> I tried to test that code
> (http://wiki.apache.org/hadoop/Hbase/MapReduce) on my cluster (Hadoop
> 0.19.1 and Hbase 0.19.0). The number of input paths was 2, map tasks
> were 274.
>
> Below is my changed code for v0.19.0.
> ---
>   public JobConf createSubmittableJob(String[] args) {
>     JobConf c = new JobConf(getConf(), TestImport.class);
>     c.setJobName(NAME);
>     FileInputFormat.setInputPaths(c, args[0]);
>
>     c.set("input.table", args[1]);
>     c.setMapperClass(InnerMap.class);
>     c.setNumReduceTasks(0);
>     c.setOutputFormat(NullOutputFormat.class);
>     return c;
>   }
>
>
>
> On Thu, Apr 23, 2009 at 6:19 PM, nguyenhuynh.mr
> <[email protected]> wrote:
>   
>> Edward J. Yoon wrote:
>>
>>     
>>> How do you to add input paths?
>>>
>>> On Wed, Apr 22, 2009 at 5:09 PM, nguyenhuynh.mr
>>> <[email protected]> wrote:
>>>
>>>       
>>>> Edward J. Yoon wrote:
>>>>
>>>>
>>>>         
>>>>> Hi,
>>>>>
>>>>> In that case, The atomic unit of split is a file. So, you need to
>>>>> increase the number of files. or Use the TextInputFormat as below.
>>>>>
>>>>> jobConf.setInputFormat(TextInputFormat.class);
>>>>>
>>>>> On Wed, Apr 22, 2009 at 4:35 PM, nguyenhuynh.mr
>>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Hi all!
>>>>>>
>>>>>>
>>>>>> I have a MR job use to import contents into HBase.
>>>>>>
>>>>>> The content is text file in HDFS. I used the maps file to store local
>>>>>> path of contents.
>>>>>>
>>>>>> Each content has the map file. ( the map is a text file in HDFS and
>>>>>> contain 1 line info).
>>>>>>
>>>>>>
>>>>>> I created the maps directory used to contain map files. And the  this
>>>>>> maps directory used to input path for job.
>>>>>>
>>>>>> When i run job, the number map task is same number map files.
>>>>>> Ex: I have 5 maps file -> 5 map tasks.
>>>>>>
>>>>>> Therefor, the map phase is slowly :(
>>>>>>
>>>>>> Why the map phase is slowly if the number map task large and the number
>>>>>> map task is equal number of files?.
>>>>>>
>>>>>> * p/s: Run jobs with: 3 node: 1 server and 2 slaver
>>>>>>
>>>>>> Please help me!
>>>>>> Thanks.
>>>>>>
>>>>>> Best,
>>>>>> Nguyen.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>
>>>>>           
>>>> Current, I use TextInputformat to set InputFormat for map phase.
>>>>
>>>>
>>>>         
>>>
>>> Thanks for your help!
>>>       
>> I use FileInputFormat to add input paths.
>> Some thing like:
>>    FileInputFormat.setInputPath(new Path("dir"));
>>
>> The "dir" is a directory contains input files.
>>
>> Best,
>> Nguyen
>>
>>
>>
>>     
Thanks!


I am using Hadoop version 0.18.2

Cheer,
Nguyen.

Re: Num map task?

Reply via email to