Note that performance will be very slow in the sort if you don't also define a RawComparator that compares the serialized forms of the keys. Look at IntWritable for how to do it.
You need to define a reasonable hashCode because the default partitioner uses it to decide which reduce to send it to. If you can define your own partitioner, you could have all of the keys with the same first string go to the same reduce for instance. And yes, the function you need to define, assuming you don't have a RawComparator, is compareTo, not equals.
