I am wondering how hadoop scores on sorting 1TB with say 1000 nodes. Is
it possible for you to try the Terasort benchmark?
Devaraj Das wrote:
This is FYI. We at Yahoo! could successfully run hadoop (upto date trunk
version) on a cluster of 2000 nodes. The programs we ran were RandomWriter
and Sort. Sort performance was pretty good - we could sort 20TB of data in
2.5 hours! Not many task failures - most of those that failed encountered
file checksum errors during merge and map output serving, some got killed
due to lack of progress reporting. Overall, a pretty successful run.