Re: Reviving Nutch 0.7

Nutch Newbie Tue, 23 Jan 2007 02:53:25 -0800

Doug:

I agree with all of your comment except the following..

Third, part of the problem seems like there are two few
contributors--that the challenges are big and the resources limited.
Splitting the project will only spread those resources more thinly.


IMHO, there are lot of duplicated effort (i.e off and on the FOSS domain).
Crawling, file parsing,  analyzers, incremental indexing  etc. are a common
discussion topic on every Lucene mailing list. Which makes resources spread
across many duplicated effort instead of having a common High-level agreed API.

Instead of branching/creating new project it is more efficient to develop libs
(i.e Nutch crawler, analyzer etc..) so that other projects (on or off
FOSS domain)
can re-use them i.e code base sharing should be easy Not difficult.
Exactly the same reason NDFS became Hadoop. Now anyone can read the
Hadoop API and combine it with Lucene
without much trouble to run Lucene Index engine on top of Hadoop.

A crawler or analyzer can be re-used in the same manner as above. Same goes for
Indexing or searching .. As you pointed out previously ...

http://www.gossamer-threads.com/lists/lucene/general/41211

Again not really proposing a new project but more easy to use
re-usable code. IMHO, Nutch will be an umbrella project for
"ala-Google" and Solr will be for "ala-Enterpise"  where Lucene
is the index lib, Hadoop is the Mapred/DFS lib ..what is missing is
Common Crawler lib, Common
indexing lib etc..

Regards

Re: Reviving Nutch 0.7

Reply via email to