What if I had multiple files in input directory, hadoop should then
fire parallel map jobs?


On Thu, May 26, 2011 at 7:21 PM, jagaran das <jagaran_...@yahoo.co.in> wrote:
> If you give really low size files, then the use of "Big Block Size" of Hadoop
> goes away.
> Instead try merging files.
>
> Hope that helps
>
>
>
> ________________________________
> From: James Seigel <ja...@tynt.com>
> To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
> Sent: Thu, 26 May, 2011 6:04:07 PM
> Subject: Re: No. of Map and reduce tasks
>
> Set input split size really low,  you might get something.
>
> I'd rather you fire up some nix commands and pack together that file
> onto itself a bunch if times and the put it back into hdfs and let 'er
> rip
>
> Sent from my mobile. Please excuse the typos.
>
> On 2011-05-26, at 4:56 PM, Mohit Anchlia <mohitanch...@gmail.com> wrote:
>
>> I think I understand that by last 2 replies :)  But my question is can
>> I change this configuration to say split file into 250K so that
>> multiple mappers can be invoked?
>>
>> On Thu, May 26, 2011 at 3:41 PM, James Seigel <ja...@tynt.com> wrote:
>>> have more data for it to process :)
>>>
>>>
>>> On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote:
>>>
>>>> I ran a simple pig script on this file:
>>>>
>>>> -rw-r--r-- 1 root root   208348 May 26 13:43 excite-small.log
>>>>
>>>> that orders the contents by name. But it only created one mapper. How
>>>> can I change this to distribute accross multiple machines?
>>>>
>>>> On Thu, May 26, 2011 at 3:08 PM, jagaran das <jagaran_...@yahoo.co.in>
> wrote:
>>>>> Hi Mohit,
>>>>>
>>>>> No of Maps - It depends on what is the Total File Size / Block Size
>>>>> No of Reducers - You can specify.
>>>>>
>>>>> Regards,
>>>>> Jagaran
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Mohit Anchlia <mohitanch...@gmail.com>
>>>>> To: common-user@hadoop.apache.org
>>>>> Sent: Thu, 26 May, 2011 2:48:20 PM
>>>>> Subject: No. of Map and reduce tasks
>>>>>
>>>>> How can I tell how the map and reduce tasks were spread accross the
>>>>> cluster? I looked at the jobtracker web page but can't find that info.
>>>>>
>>>>> Also, can I specify how many map or reduce tasks I want to be launched?
>>>>>
>>>>> From what I understand is that it's based on the number of input files
>>>>> passed to hadoop. So if I have 4 files there will be 4 Map taks that
>>>>> will be launced and reducer is dependent on the hashpartitioner.
>>>>>
>>>
>>>
>

Reply via email to