Hi Doug,
In the future I would like to implement a more automated distributed search system than Nutch currently has. One way to do this might be to use MapReduce. Each map task's input could be an index and some segment data. The map method would serve queries, i.e., run a Nutch DistributedSearch.Server. It would first copy the index out of NDFS to the local disk, for better performance.

I have 2 questions regarding this mechanism.
First, what you plan to make the running search servers known by the master (search client) I can imaging a similar mechanism as the tasktracker and jobtracker use, a kind of heart beat message. Second wouldn't be there also a possibility to solve nutch-92 (DistributedSearch incorrectly scores results) by first running a map reduce task over the indexes that counting terms and than hold this somehow in the memory of master (search server client). But I'm not sure if that is may to much data.

Stefan


Reply via email to