Hi Ravi, Whether a sort is required is still a point of debate: the primary reason is to collect the entries with the same key, but one can implement MapReduce with hash deduping. The performance advantages/disadvantages are still a subject of debate.
If you don't need sorting, you can always implement map-side aggregation though and potentially set the # of reducers to 0. There is no potential risk, but if you want to aggregate results across different mappers you'll get back to the original problem. Alex K On Mon, Jul 26, 2010 at 1:32 PM, Chinni, Ravi <[email protected]> wrote: > I have an MR application that is running fine except for the > performance. Increasing the number of data nodes is not an option to me. > > > > Looking at the source code of MR framework, I noticed that the partitioned > output of each mapper is sorted (MapTask.java), and on the reduce side > partitions from various mappers are merged (ReduceTask.java) before running > the reduce step. Functionally, reducers in my application does not require > data to be in sorted order and getting rid of the sort and merge steps in > the framework will help my application. > > > > Does anyone know, why the sort and merge of intermediate data is being done > by the framework? Is there anything - MR functional concepts, framework > design etc. - that will need the sort and merge of intermediate data? I want > to give a shot in getting rid of the sort and merge steps in the framework > and want to know of any potential risks involved. > > > > Any input is appreciated. > > > > Thanks, > > Ravi > > > > > _____________________________________________________________________________ > > ATTENTION: > > The information contained in this message (including any files transmitted > with this message) may contain proprietary, trade secret or other > confidential and/or legally privileged information. Any pricing information > contained in this message or in any files transmitted with this message is > always confidential and cannot be shared with any third parties without > prior written approval from Syncsort. This message is intended to be read > only by the individual or entity to whom it is addressed or by their > designee. If the reader of this message is not the intended recipient, you > are on notice that any use, disclosure, copying or distribution of this > message, in any form, is strictly prohibited. If you have received this > message in error, please immediately notify the sender and/or Syncsort and > destroy all copies of this message in your possession, custody or control. >
