My understanding is that HDFS places blocks randomly. As I would expect, then, when I use "hadoop fsck" to look at block placements for my files, I see that some nodes have more blocks than the average. I would expect that these hot spots would cause a performance hit relative to a more even placement of blocks.
I'd like to experiment with non-random block placement to see if this can provide a performance improvement. Where in the code would I start looking to find the existing code for random placement? Cheers, John