Hi Arun, Suppose I am doing a simple wordcount and the map-phase is over. After the shuffle, in each partition, the inputs to the reducer, come in a sorted order of keys. I want to disable this.
Take the same case of wc. I don't mind the order in which my reduce gets the keys of a single partition. I guess hadoop does an external sort for this. I want to disable that. Thanks, jS On Sun, Sep 11, 2011 at 7:03 AM, Arun C Murthy <[email protected]> wrote: > The point of a 'reduce phase' is to aggregate keys from different maps > (i.e. all inputs). > > I'm not sure what you are trying to do, but a use-case will help. > > IAC, the only way to achieve what you are trying to do is to run to jobs > with the first a map-only job (i.e. #reduces = 0). > > Arun > > On Sep 10, 2011, at 10:19 PM, john smith wrote: > > > Hey, > > > > I have reduce phases too. But for each reduce, I dont need sorted input > > (map-output for that corresponding reduce task). > > Setting #red to 0 completely removes the reduce phase. > > > > Am I missing something? > > > > Thanks, > > > > On Sun, Sep 11, 2011 at 12:18 AM, Arun C Murthy <[email protected]> > wrote: > > > >> Run a map-only job with #reduces set to 0. > >> > >> Arun > >> > >> On Sep 10, 2011, at 2:06 AM, john smith wrote: > >> > >>> Hi, > >>> > >>> Some of the MR jobs I run doesn't need sorting of map-output in each > >>> partition. Is there someway I can disable it? > >>> > >>> Any help? > >>> > >>> Thanks > >>> jS > >> > >> > >
