What if I had multiple files in input directory, hadoop should then fire parallel map jobs?
On Thu, May 26, 2011 at 7:21 PM, jagaran das <jagaran_...@yahoo.co.in> wrote: > If you give really low size files, then the use of "Big Block Size" of Hadoop > goes away. > Instead try merging files. > > Hope that helps > > > > ________________________________ > From: James Seigel <ja...@tynt.com> > To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> > Sent: Thu, 26 May, 2011 6:04:07 PM > Subject: Re: No. of Map and reduce tasks > > Set input split size really low, you might get something. > > I'd rather you fire up some nix commands and pack together that file > onto itself a bunch if times and the put it back into hdfs and let 'er > rip > > Sent from my mobile. Please excuse the typos. > > On 2011-05-26, at 4:56 PM, Mohit Anchlia <mohitanch...@gmail.com> wrote: > >> I think I understand that by last 2 replies :) But my question is can >> I change this configuration to say split file into 250K so that >> multiple mappers can be invoked? >> >> On Thu, May 26, 2011 at 3:41 PM, James Seigel <ja...@tynt.com> wrote: >>> have more data for it to process :) >>> >>> >>> On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: >>> >>>> I ran a simple pig script on this file: >>>> >>>> -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log >>>> >>>> that orders the contents by name. But it only created one mapper. How >>>> can I change this to distribute accross multiple machines? >>>> >>>> On Thu, May 26, 2011 at 3:08 PM, jagaran das <jagaran_...@yahoo.co.in> > wrote: >>>>> Hi Mohit, >>>>> >>>>> No of Maps - It depends on what is the Total File Size / Block Size >>>>> No of Reducers - You can specify. >>>>> >>>>> Regards, >>>>> Jagaran >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> From: Mohit Anchlia <mohitanch...@gmail.com> >>>>> To: common-user@hadoop.apache.org >>>>> Sent: Thu, 26 May, 2011 2:48:20 PM >>>>> Subject: No. of Map and reduce tasks >>>>> >>>>> How can I tell how the map and reduce tasks were spread accross the >>>>> cluster? I looked at the jobtracker web page but can't find that info. >>>>> >>>>> Also, can I specify how many map or reduce tasks I want to be launched? >>>>> >>>>> From what I understand is that it's based on the number of input files >>>>> passed to hadoop. So if I have 4 files there will be 4 Map taks that >>>>> will be launced and reducer is dependent on the hashpartitioner. >>>>> >>> >>> >