On Jun 28, 2006, at 9:06 PM, Arun C Murthy wrote:
All,
<background>
I have a *map* which does some processing and then a *reduce*
which sorts the results.
TextInputFormat & TextOutputFormat are the input/output formats
respectively.
However the *sort* I want to perform is as follows:
I want to sort output by 'comparing' 'columns' of 'key's in the
Comparator and not the entire 'key'.
E.g. spec: column1, column0 is the sort-spec.
aaa ccc ggg
bbb aaa hhh
should result in:
bbb aaa hhh
aaa ccc ggg
</background>
I can't seem to find an 'elegant' way to do this via the MR
framework i.e. I can't seem to be able to set a *policy* (i.e. set the
sort-spec) for the WritableComparable via the framework. Is there
something I'm missing? In essence I probably need a *configure*
callback for the WritableComparable interface too? Is there a better
way? Or is this outside the scope of the framework.
There is a way to do it, but it isn't surprising that you missed it.
When JobConf creates a new instance of objects, if they are
Configurable, they get sent the Configuration. So, if you make a
ConfigurableComparator that extends WritableComparator and implements
Configurable, it will get its setConf method called with the job's
JobConf. Now do something like:
JobConf conf = new JobConf();
conf.set("my.sort.order", "1,0,2");
conf.setOutputKeyComparatorClass(ConfigurableComparator.class);
you should get the information where it needs to go.
-- Owen