Set input split size really low, you might get something. I'd rather you fire up some nix commands and pack together that file onto itself a bunch if times and the put it back into hdfs and let 'er rip
Sent from my mobile. Please excuse the typos. On 2011-05-26, at 4:56 PM, Mohit Anchlia <[email protected]> wrote: > I think I understand that by last 2 replies :) But my question is can > I change this configuration to say split file into 250K so that > multiple mappers can be invoked? > > On Thu, May 26, 2011 at 3:41 PM, James Seigel <[email protected]> wrote: >> have more data for it to process :) >> >> >> On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: >> >>> I ran a simple pig script on this file: >>> >>> -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log >>> >>> that orders the contents by name. But it only created one mapper. How >>> can I change this to distribute accross multiple machines? >>> >>> On Thu, May 26, 2011 at 3:08 PM, jagaran das <[email protected]> >>> wrote: >>>> Hi Mohit, >>>> >>>> No of Maps - It depends on what is the Total File Size / Block Size >>>> No of Reducers - You can specify. >>>> >>>> Regards, >>>> Jagaran >>>> >>>> >>>> >>>> ________________________________ >>>> From: Mohit Anchlia <[email protected]> >>>> To: [email protected] >>>> Sent: Thu, 26 May, 2011 2:48:20 PM >>>> Subject: No. of Map and reduce tasks >>>> >>>> How can I tell how the map and reduce tasks were spread accross the >>>> cluster? I looked at the jobtracker web page but can't find that info. >>>> >>>> Also, can I specify how many map or reduce tasks I want to be launched? >>>> >>>> From what I understand is that it's based on the number of input files >>>> passed to hadoop. So if I have 4 files there will be 4 Map taks that >>>> will be launced and reducer is dependent on the hashpartitioner. >>>> >> >>
