Hi Ravi,

Whether a sort is required is still a point of debate: the primary reason is
to collect the entries with the same key, but one can implement MapReduce
with hash deduping.  The performance advantages/disadvantages are still a
subject of debate.

If you don't need sorting, you can always implement map-side aggregation
though and potentially set the # of reducers to 0.  There is no potential
risk, but if you want to aggregate results across different mappers you'll
get back to the original problem.

Alex K

On Mon, Jul 26, 2010 at 1:32 PM, Chinni, Ravi <[email protected]> wrote:

>   I have an MR application that is running fine except for the
> performance. Increasing the number of data nodes is not an option to me.
>
>
>
> Looking at the source code of MR framework, I noticed that the partitioned
> output of each mapper is sorted (MapTask.java), and on the reduce side
> partitions from various mappers are merged (ReduceTask.java) before running
> the reduce step. Functionally, reducers in my application does not require
> data to be in sorted order and getting rid of the sort and merge steps in
> the framework will help my application.
>
>
>
> Does anyone know, why the sort and merge of intermediate data is being done
> by the framework? Is there anything - MR functional concepts, framework
> design etc. - that will need the sort and merge of intermediate data? I want
> to give a shot in getting rid of the sort and merge steps in the framework
> and want to know of any potential risks involved.
>
>
>
> Any input is appreciated.
>
>
>
> Thanks,
>
> Ravi
>
>
>
>
> _____________________________________________________________________________
>
>  ATTENTION:
>
> The information contained in this message (including any files transmitted
> with this message) may contain proprietary, trade secret or other
> confidential and/or legally privileged information. Any pricing information
> contained in this message or in any files transmitted with this message is
> always confidential and cannot be shared with any third parties without
> prior written approval from Syncsort. This message is intended to be read
> only by the individual or entity to whom it is addressed or by their
> designee. If the reader of this message is not the intended recipient, you
> are on notice that any use, disclosure, copying or distribution of this
> message, in any form, is strictly prohibited. If you have received this
> message in error, please immediately notify the sender and/or Syncsort and
> destroy all copies of this message in your possession, custody or control.
>

Reply via email to