The key here is that the task farm need not coincide exactly with the
storage farm.

This is a good point.  When running a utility farm with mltiple job
sub-clusters, it would be a royal pain in the ass if the data had to be
moved to the sub-cluster you want to run your job in.  If you don't move the
data, though, then data/task locality changes meaning pretty drastically.


On 9/18/07 8:37 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

>> JobInProgress.findNewTaskseems to see the world as local or other,
>> with no rack-awareness.  are you
>> referring to the ReplicationTargetChooser, which will always put one of the
>> three replicas (assuming your replication level is 3) on your rack, hence
>> increasing your chances of finding the block within your sub-jobtracker net?
> 
> That works too, but I was thinking that, even if a host in your cluster
> does not contain a block, a map task could be placed on a host in a rack
> that contains the block, so the map input would be rack-local.

Reply via email to