Assume we have a medium size cluster - say 20 nodes and that the cluster is used for one job and cannot change in size. Assume we are sorting a large data set. As we increase the size of the data sorted say from 100GB to 1000GB to 10000GB does the time for the sort scale as N or as NLogN? I have heard both answers with NLogN coming largely from folks less familiar with hadoop and as N from others with more experience but I am skeptical - has anyone done tests and can contribute real data
-- Steven M. Lewis PhD Institute for Systems Biology Seattle WA
