The key here is that the task farm need not coincide exactly with the storage farm.
This is a good point. When running a utility farm with mltiple job sub-clusters, it would be a royal pain in the ass if the data had to be moved to the sub-cluster you want to run your job in. If you don't move the data, though, then data/task locality changes meaning pretty drastically. On 9/18/07 8:37 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: >> JobInProgress.findNewTaskseems to see the world as local or other, >> with no rack-awareness. are you >> referring to the ReplicationTargetChooser, which will always put one of the >> three replicas (assuming your replication level is 3) on your rack, hence >> increasing your chances of finding the block within your sub-jobtracker net? > > That works too, but I was thinking that, even if a host in your cluster > does not contain a block, a map task could be placed on a host in a rack > that contains the block, so the map input would be rack-local.
