On Thu, Dec 13, 2007 at 11:03:50AM -0800, Ted Dunning wrote:
> 
> I don't think so (but I don't run nutch)
> 
> To actually run searches, the search engines copy the index to local
> storage.  Having them in HDFS is very nice, however, as a way to move them
> to the right place.

Even in case if there is extremely fast network connection between nodes,
moving indexes of several gigabytes of size seems to be very slow.

Is there any way to guarantee the request would be sent to certain data node
which already holds required part of index, or guarantee the all reduce jobs
will be running on same host and this way index will be located at the same
host?

I feel like map/reduce is perfect way to index large set of documents, however
I'm not sure how the searching will be performed later. I can think if the
search request will be broadcasted to ALL nodes, each of node will take the
search request, perform some search and return (or not) results which will be
reduced later, however as far as I can see Hadoop will send the request to
first node which seems to be free - but not necessary the same node which
holds the index suitable for this request?

-- 
Eugene N Dzhurinsky

Attachment: pgpsngonG9gPA.pgp
Description: PGP signature

Reply via email to