Hi Teodor, I am not clear what you call 'real numbers'. Terasort does work on bytes (10 bytes key and 90 bytes payload). The actual 'meaning' of the bytes really does not matter as Hadoop uses binary comparators on the raw value.
Total order partitioning should also work with any WritableComparable key (if it doesn't, it's a bug). My guess your problem is converting a char trie to WritableComparable. Can you provide more background? Are the strings of fixed length? Alex K On Sun, Aug 1, 2010 at 2:23 PM, Teodor Macicas <[email protected]>wrote: > Hi all, > > > I am using hadoop 0.20.2 and I want to use sort huge amount of data. I've > read about Terasort [from examples], but now it's using 10bytes char keys. > Changing keys from char to integer wasn't a good solution as Terasort > builds a trie for creating total order partitions. I got stuck when I tried > to change the char trie to a one suitable for number keys. > > Then, I've given a try to Sort [also from examples] and it did work for > integer keys, but without a total order partitioning. In the end of the day, > the final result can not be created only by putting together all reducers' > outputs. Each reducer sorts only a subset of data and no merging is occured > between two reducers. > > Please can anyone advise me what and how to use in order to sort huge > amount of real numbers ? > Looking forward for your replies. > > > Thank you. > Best, > Teodor >
