Re: The Future of Nutch

Ken Krugler Wed, 01 Apr 2009 07:43:30 -0700

On Fri, 2009-03-13 at 19:42 -0700, buddha1021 wrote:

 hi dennis:

...
 > I am confident that hadoop can process the large datas of the  www search

 engine! But lucene? I am afraid of the limited size of lucene's index per
 server is very little ,10G? or 30G? this is not enough for the www search

 > engine! IMO, this is a bottleneck!


I agree that the actual problem/solution of accessing lucene indexes is
to keep them small. What does the possibility of having a clouded index
serve if accessing it takes hours?

For me here should lie one of nutch core competences: making search in
BIG indexes fast (as fast as in SMALL indexes).

I would suggest looking at Katta (http://katta.sourceforge.net/).It's one of several projects where the goal is to support very largeLucene indexes via distributed shards. Solr has also added federatedsearch support.


-- Ken
--
Ken Krugler
+1 530-210-6378

Re: The Future of Nutch

Reply via email to