RE: "Moving Computation is Cheaper than Moving Data"

Devaraj Das Thu, 23 Aug 2007 03:17:30 -0700

That's the paradigm of Hadoop's Map-Reduce. 

> -----Original Message-----
> From: Samuel LEMOINE [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, August 23, 2007 2:48 PM
> To: [email protected]
> Subject: "Moving Computation is Cheaper than Moving Data"
> 
> When I read the Hadoop documentation:
> The Hadoop Distributed File System: Architecture and Design
> (http://lucene.apache.org/hadoop/hdfs_design.html)
> 
> a paragraph hold my attention:
> 
> 
>       "Moving Computation is Cheaper than Moving Data"
> 
> A computation requested by an application is much more 
> efficient if it is executed near the data it operates on. 
> This is especially true when the size of the data set is 
> huge. This minimizes network congestion and increases the 
> overall throughput of the system. The assumption is that it 
> is often better to migrate the computation closer to where 
> the data is located rather than moving the data to where the 
> application is running. HDFS provides interfaces for 
> applications to move themselves closer to where the data is located.
> 
> 
> 
> 
> I'd like to know how to perform that, espacially with the aim 
> of distributed Lucene search ? Which Hadoop classes should I 
> use to do that ?
> 
> Thanks in advance,
> 
> Samuel
>

RE: "Moving Computation is Cheaper than Moving Data"

Reply via email to