Hi all,

I am using hadoop 0.20.2 and I want to use sort huge amount of data. I've read about Terasort [from examples], but now it's using 10bytes char keys. Changing keys from char to integer wasn't a good solution as Terasort builds a trie for creating total order partitions. I got stuck when I tried to change the char trie to a one suitable for number keys.

Then, I've given a try to Sort [also from examples] and it did work for integer keys, but without a total order partitioning. In the end of the day, the final result can not be created only by putting together all reducers' outputs. Each reducer sorts only a subset of data and no merging is occured between two reducers.

Please can anyone advise me what and how to use in order to sort huge amount of real numbers ?
Looking forward for your replies.


Thank you.
Best,
Teodor

Reply via email to