Then I think you might be best exploring running a getmerge on each client. How you trigger that is up to you, but something like Fabric [1] might help. Others might propose different solutions, but it doesn't sound like MR is a natural choice to me.
I would expect this is the very fastest way of getting the data locally. There is one alternative you might consider - set the replication factor to be the same as the number of machines for whatever is producing the input files. This way they will all be local, although will likely be split into multiple files (part000001 etc) I hope this helps, Tim [1] http://docs.fabfile.org/en/1.4.3/index.html On Thu, Aug 23, 2012 at 1:08 PM, Hamid Oliaei <oli...@gmail.com> wrote: > Hi, > > First of all, thank you Tim for giving your time. > > The answer of first question is yes. > My inputs are in format of triples (sub,pre,obj) and they are stored on > the HDFS. > The problem is: After running some MR jobs,some data generated in all > machines and I want to each machine send part of that to others in minimum > time, using for next phase. > I know that this is unfamiliar with MR nature but that was the first > solution coming to my mind and I am glad to know other suggestions. > > Regards, > Hamid > > >