If there is a job with files f1 and f2, and a Mapper (m1) is running against a file (f2) which is far from the local machine(m1), will the overhead of copying f2 over to m1 be worth it?.
That is .... - is the amount of resources required to read data off a remote machine (m2) worth it? Or would it be better if that remote (m2) now simply processed both files (f1, f2) in turn? Jay Vyas http://jayunit100.blogspot.com