Re: Quick Clarification of sort mechanism

Jeff Zhang Fri, 15 Jan 2010 19:59:09 -0800

Hi Rob,

The sort is an internal mechanism in hadoop, the reduce step will always do
sort on the keys.
If you want to sort the result by count, you could start a second job with
the input from the first job, and use the count as the key, word as the
value,.




On Fri, Jan 15, 2010 at 2:42 PM, Rob Stewart <[email protected]>wrote:

> Hi,
>
> I am having a look at the WordCount java example here:
>
> http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Walk-through
>
> I am wanting a word count application that, instead of sorting by key
> (alphabetically by word), I want to sort by the count (frequency) of the
> words.
>
> I can't see in the reduce method in the above example where exactly the
> key/values get specified to order by key alphabetically? Or how I can
> override this to state to for by the value of the final reduce (i.e. by the
> frequency).
>
> Thanks,
>
> Rob Stewart
>



-- 
Best Regards

Jeff Zhang

Re: Quick Clarification of sort mechanism

Reply via email to