Hi,
I am very new with hadoop and I'm hoping someone can help me do a two column
sort.
For my input, I have lines with 3 columns. I would like to sort the first
column by string ascending
and the second column by integer descending.
The listing below shows an example input and expected output.
The approach I have taken is to use the JobConf.
setKeyFieldComparatorOptions.
>From reading various resources, putting this setting:
conf.setKeyFieldComparatorOptions("-k1 -k2nr")
conf.set("map.output.key.field.separator", " ");
should do what I want, sort the first column by string, and the second
column
by number descending. I use a space character to separte the 2 key pieces.
But it doesn't seem to work. The actual output I get is also shown below.
Any ideas on what I am doing wrong? The first column seems to be sorted
correctly
but some of the second columns values are not correct.
For example, these two rows should be reverse.
carrot<adog 1 value_c1
carrot<adog 3 value_c3
Any help is greatly appreciated.
David
/*sample input*/
apple<adog 3 value_a3
apple<adog 1 value_a1
apple<acat 2 value_a2
apple<abird 12 value_a2
carrot<adog 1 value_c1
carrot<adog 3 value_c3
carrot<abird 2 value_c2
banana<acat 1 value_b1
banana<abird 3 value_b3
banana<adog 2 value_b2
banana<adog 11 value_b11
banana<abird 17 value_b17
banana<acat 4 value_b4
/*expected output*/
apple<abird 12 value_a2
apple<acat 2 value_a2
apple<adog 3 value_a3
apple<adog 1 value_a1
banana<abird 17 value_b17
banana<abird 3 value_b3
banana<acat 4 value_b4
banana<acat 1 value_b1
banana<adog 11 value_b11
banana<adog 2 value_b2
carrot<abird 2 value_c2
carrot<adog 3 value_c3
carrot<adog 1 value_c1
/*actual output*/
apple<abird 12 value_a2
apple<acat 2 value_a2
apple<adog 1 value_a1
apple<adog 3 value_a3
banana<abird 17 value_b17
banana<abird 3 value_b3
banana<acat 1 value_b1
banana<acat 4 value_b4
banana<adog 11 value_b11
banana<adog 2 value_b2
carrot<abird 2 value_c2
carrot<adog 1 value_c1
carrot<adog 3 value_c3