Re: data locality

Eugene Kirpichov Tue, 25 Oct 2011 21:23:28 -0700

But I guess it isn't always possible to achieve optimal scheduling, right?
What's done then; any account for network topology perhaps?




26.10.2011, в 4:42, Mapred Learn <[email protected]> написал(а):

> Yes that's right !
> 
> Sent from my iPhone
> 
> On Oct 25, 2011, at 5:36 PM, <[email protected]> wrote:
> 
>> So I guess the job tracker is the one reading the HDFS meta-data and then
>> optimizing the scheduling of map jobs based on that?
>> 
>> 
>> On 10/25/11 3:13 PM, "Shevek" <[email protected]> wrote:
>> 
>>> We pray to $deity that the mapreduce block size is about the same as (or
>>> smaller than) the hdfs block size. We also pray that file format
>>> synchronization points are frequent when compared to block boundaries.
>>> 
>>> The JobClient finds the location of each block of each file. It splits the
>>> job into FileSplit(s), with one per block.
>>> 
>>> Each FileSplit is processed by a task. The Split contains the locations in
>>> which the task should best be run.
>>> 
>>> The last block may be very short. It is then subsumed into the preceding
>>> block.
>>> 
>>> Some data is transferred between nodes when the synchronization point for
>>> the file format is not at a block boundary. (It basically never is, but we
>>> hope it's close, or the purpose of MR locality is defeated.)
>>> 
>>> Specifically to your questions: Most of the data should be read from the
>>> local hdfs node under the above assumptions. The communication layer
>>> between
>>> mapreduce and hdfs is not special.
>>> 
>>> S.
>>> 
>>> On 25 October 2011 11:49, <[email protected]> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I am trying to understand how data locality works in hadoop.
>>>> 
>>>> If you run a map reduce job do the mappers only read data from the host
>>>> on
>>>> which they are running?
>>>> 
>>>> Is there a communication protocol between the map reduce layer and HDFS
>>>> layer so that the mapper gets optimized to read data locally?
>>>> 
>>>> Any pointers on which layer of the stack handles this?
>>>> 
>>>> Cheers,
>>>> Ivan
>>>> 
>>

Re: data locality

Reply via email to