Comparing

Juan P. Mon, 23 May 2011 13:36:37 -0700

Hi guys,
I wanted to get your help with a couple of questions which came up while
looking at the Hadoop Comparator/Comparable architecture.


As I see it before each reducer operates on each key, a sorting algorithm is
applied to them. *Why does Hadoop need to do that?*

If I implement my own class and I intend to use it as a Key I must allow for
instances of my class to be compared. So I have 2 choices: I can implement
WritableComparable or I can register a WritableComparator for my
class. Should I fail to do either, would the Job fail?
If I register my WritableComparator which does not use the Comparable
interface at all, does my Key need to implement WritableComparable?
If I don't implement my Comparator and my Key implements WritableComparable,
does it mean that Hadoop will deserialize my Keys twice? (once for sorting,
and once for reducing)
What is RawComparable used for?

Thanks for your help!
Pony

Comparing

Reply via email to