On Fri, Jan 14, 2011 at 3:09 AM, Pedro Costa <psdc1...@gmail.com> wrote:
> Hi, > > If a split location contains more that one location, it means that > this split file is replicated through all locations, or it means that > a split is divided into several blocks, and each block is in one > location? It requests that the map runs on one of those machines or on the same rack as one of those machines. Currently there is no way to weight if one machine in the list is "better" than another. If an input split covers multiple blocks, the InputFormat is best served by picking the top N machines that are close a copy of most of the data, where N is roughly 3 to 5. -- Owen