Re: "Moving Computation is Cheaper than Moving Data"

Samuel LEMOINE Thu, 23 Aug 2007 03:32:44 -0700

Well, I don't get it... when you pass arguments to a map job, you justgive a key and a value, how can hadoop make the link between thosearguments and the data's concerned? Really, your answer don't help me atall, sorry ^^


Devaraj Das a écrit :

That's the paradigm of Hadoop's Map-Reduce.
-----Original Message-----
From: Samuel LEMOINE [mailto:[EMAIL PROTECTED]Sent: Thursday, August 23, 2007 2:48 PM
To: [email protected]
Subject: "Moving Computation is Cheaper than Moving Data"
When I read the Hadoop documentation:
The Hadoop Distributed File System: Architecture and Design
(http://lucene.apache.org/hadoop/hdfs_design.html)

a paragraph hold my attention:


      "Moving Computation is Cheaper than Moving Data"
A computation requested by an application is much moreefficient if it is executed near the data it operates on.This is especially true when the size of the data set ishuge. This minimizes network congestion and increases theoverall throughput of the system. The assumption is that itis often better to migrate the computation closer to wherethe data is located rather than moving the data to where theapplication is running. HDFS provides interfaces forapplications to move themselves closer to where the data is located.
I'd like to know how to perform that, espacially with the aimof distributed Lucene search ? Which Hadoop classes should Iuse to do that ?
Thanks in advance,

Samuel

Re: "Moving Computation is Cheaper than Moving Data"

Reply via email to