On Aug 15, 2010, at 10:34 AM, Kris Jirapinyo wrote:
> 1) Our new cluster has 25 machines but 100 mappers.  When distcp is 
> triggered, it seems to allocate 4 mappers per machine.  Is this normal? The 
> issue here is that say distcp only needs 8 mappers, I would think that distcp 
> would try to distribute those to different machines so that perhaps IO will 
> not be saturated on one machine.  What I've been seeing is that for those 8 
> map tasks, 4 are assigned to one machine and 4 to the other, as opposed to 8 
> being assigned do a different machine altogether.

I don't think distcp (or any other job, for that matter) can provide hints to 
the scheduler about how its tasks should be distributed, other than pointing to 
its input files.  So very likely, the distcp's input files are on those nodes 
where the tasks are located.

You can always try to bump up the replication as part of the distcp's 
parameters.

Reply via email to