Re: scalability limits getDetails, mapFile Readers?

Stefan Groschupf Wed, 01 Mar 2006 16:02:35 -0800

Hi Andrzej,

* merge 80 segments into 1. A lot of IO involved... and you have torepeat it from time to time. Ugly.

I agree.

* implement a search server as a map task. Several challenges: itneeds to partition the Lucene index, and it has to copy all partsof segments and indexes from DFS to the local storage, otherwiseperformance will suffer. However, the number of open files permachine would be reduced, because (ideally) each machine would dealwith few or a single part of segment and a single part of index...


Well I played around and already had a kind of prototype.
I had seen following problems:

+ having a kind of repository of active search servers

possibility A: find all tasktrackers running a specific task (alreadydiscussed in the hadoop mailing list)possibility B: having a rpc server running in the jvm that runs thesearch server client, add the hostname to the jobconf and similar totask - jobtracker search server announce itself via hardbeat to thesearch server 'repository'.


+ having the index locally and the segment in the dfs.

++ adding to NutchBean init a dfs for index and one for segmentscould fix this, or more general add support for streamhandlers likedfs:// vs file://. (very long term)

+ downloading an index from dfs until the mapper starts or just indexthe segment data to local hdd and let the mapper run for the next 30days?

Stefan

Re: scalability limits getDetails, mapFile Readers?

Reply via email to