I am having a bit of trouble understanding how the Terasort benchmark works, 
especially the fundamentals of how the data is sorted. If the data is being 
split into many chunks wouldn't it all have to be re-integrated back into the 
entire dataset?

And since a terabyte is huge wouldn't it take a very long time. I seem to be 
missing a few crucial steps in the process and if someone could help me 
understand how terasort is working that would be great. Any papers or videos on 
this topic would be greatly appreciated.



-SB

Reply via email to