The combiner needs sorted input. On Mon, Jul 26, 2010 at 1:46 PM, Alex Kozlov <[email protected]> wrote:
> Hi Ravi, > > Whether a sort is required is still a point of debate: the primary reason > is to collect the entries with the same key, but one can implement MapReduce > with hash deduping. The performance advantages/disadvantages are still a > subject of debate. > > If you don't need sorting, you can always implement map-side aggregation > though and potentially set the # of reducers to 0. There is no potential > risk, but if you want to aggregate results across different mappers you'll > get back to the original problem. > > Alex K > > On Mon, Jul 26, 2010 at 1:32 PM, Chinni, Ravi <[email protected]>wrote: > >> I have an MR application that is running fine except for the >> performance. Increasing the number of data nodes is not an option to me. >> >> >> >> Looking at the source code of MR framework, I noticed that the partitioned >> output of each mapper is sorted (MapTask.java), and on the reduce side >> partitions from various mappers are merged (ReduceTask.java) before running >> the reduce step. Functionally, reducers in my application does not require >> data to be in sorted order and getting rid of the sort and merge steps in >> the framework will help my application. >> >> >> >> Does anyone know, why the sort and merge of intermediate data is being >> done by the framework? Is there anything - MR functional concepts, framework >> design etc. - that will need the sort and merge of intermediate data? I want >> to give a shot in getting rid of the sort and merge steps in the framework >> and want to know of any potential risks involved. >> >> >> >> Any input is appreciated. >> >> >> >> Thanks, >> >> Ravi >> >> >> >> >> _____________________________________________________________________________ >> >> ATTENTION: >> >> The information contained in this message (including any files transmitted >> with this message) may contain proprietary, trade secret or other >> confidential and/or legally privileged information. Any pricing information >> contained in this message or in any files transmitted with this message is >> always confidential and cannot be shared with any third parties without >> prior written approval from Syncsort. This message is intended to be read >> only by the individual or entity to whom it is addressed or by their >> designee. If the reader of this message is not the intended recipient, you >> are on notice that any use, disclosure, copying or distribution of this >> message, in any form, is strictly prohibited. If you have received this >> message in error, please immediately notify the sender and/or Syncsort and >> destroy all copies of this message in your possession, custody or control. >> > >
