“Moving Computation is Cheaper than Moving Data”

Samuel LEMOINE Thu, 23 Aug 2007 02:18:39 -0700

When I read the Hadoop documentation:

The Hadoop Distributed File System: Architecture and Design(http://lucene.apache.org/hadoop/hdfs_design.html)


a paragraph hold my attention:


     “Moving Computation is Cheaper than Moving Data”

A computation requested by an application is much more efficient if itis executed near the data it operates on. This is especially true whenthe size of the data set is huge. This minimizes network congestion andincreases the overall throughput of the system. The assumption is thatit is often better to migrate the computation closer to where the datais located rather than moving the data to where the application isrunning. HDFS provides interfaces for applications to move themselvescloser to where the data is located.

I'd like to know how to perform that, espacially with the aim ofdistributed Lucene search ? Which Hadoop classes should I use to do that ?


Thanks in advance,

Samuel

“Moving Computation is Cheaper than Moving Data”

Reply via email to