When I read the Hadoop documentation:
The Hadoop Distributed File System: Architecture and Design
(http://lucene.apache.org/hadoop/hdfs_design.html)
a paragraph hold my attention:
“Moving Computation is Cheaper than Moving Data”
A computation requested by an application is much more efficient if it
is executed near the data it operates on. This is especially true when
the size of the data set is huge. This minimizes network congestion and
increases the overall throughput of the system. The assumption is that
it is often better to migrate the computation closer to where the data
is located rather than moving the data to where the application is
running. HDFS provides interfaces for applications to move themselves
closer to where the data is located.
I'd like to know how to perform that, espacially with the aim of
distributed Lucene search ? Which Hadoop classes should I use to do that ?
Thanks in advance,
Samuel