Hey guys, I've been reading up on DHTs quite a bit recently, and have come across the Vivaldi "topology-aware structured overlay" in a few places. Vivaldi uses hash functions to define a node's distance from other nodes in a 2 or 3 dimensional space. Every time a node communicates with another node, it uses the round trip time to modify/improve its own location hash. By looking at hash distances, you can determine the relative connection quality between any 2 nodes. (I'm probably explaining it all wrong. See this paper instead: http://portal.acm.org/citation.cfm?id=1272980.1272985&coll=GUIDE&dl=ACM&CFID=15151515&CFTOKEN=6184618 )
In Hadoop's case, namenodes and jobtrackers could use Vivaldi coordinates from datanodes, and attempt to either minimize or maximize physical proximity (depending). I know some work has been going on to integrate rack-awareness, so I just thought I'd point out the possibility of a self-managing solution in case you guys weren't aware of it. Stu Hood Webmail.us "You manage your business. We'll manage your email."®
