I was looking through the kmeans code. As I recall, a good way to pick the inital cluster positions is to choose random data points. Is there an easy way to do 'randomly select N records' in map reduce?
I was looking through the kmeans code. As I recall, a good way to pick the inital cluster positions is to choose random data points. Is there an easy way to do 'randomly select N records' in map reduce?