Actually, the inputs to the map function are a key and value pair.

The inputs to the map JOB, however, is a slice of a file from which inputs
for the map function will be taken.  Since the manager that creates the map
job knows where file slice is stored, it can start the job there.



On 8/23/07 3:32 AM, "Samuel LEMOINE" <[EMAIL PROTECTED]> wrote:

> Well, I don't get it... when you pass arguments to a map job, you just
> give a key and a value, how can hadoop make the link between those
> arguments and the data's concerned? Really, your answer don't help me at
> all, sorry ^^
> 
> Devaraj Das a écrit :
>> That's the paradigm of Hadoop's Map-Reduce.
>> 
>>   
>>> -----Original Message-----
>>> From: Samuel LEMOINE [mailto:[EMAIL PROTECTED]
>>> Sent: Thursday, August 23, 2007 2:48 PM
>>> To: [email protected]
>>> Subject: "Moving Computation is Cheaper than Moving Data"
>>> 
>>> When I read the Hadoop documentation:
>>> The Hadoop Distributed File System: Architecture and Design
>>> (http://lucene.apache.org/hadoop/hdfs_design.html)
>>> 
>>> a paragraph hold my attention:
>>> 
>>> 
>>>       "Moving Computation is Cheaper than Moving Data"
>>> 
>>> A computation requested by an application is much more
>>> efficient if it is executed near the data it operates on.
>>> This is especially true when the size of the data set is
>>> huge. This minimizes network congestion and increases the
>>> overall throughput of the system. The assumption is that it
>>> is often better to migrate the computation closer to where
>>> the data is located rather than moving the data to where the
>>> application is running. HDFS provides interfaces for
>>> applications to move themselves closer to where the data is located.
>>> 
>>> 
>>> 
>>> 
>>> I'd like to know how to perform that, espacially with the aim
>>> of distributed Lucene search ? Which Hadoop classes should I
>>> use to do that ?
>>> 
>>> Thanks in advance,
>>> 
>>> Samuel
>>> 
>>>     
>> 
>> 
>>   
> 

Reply via email to