Stephane, Nutch uses Lucene for indexing, and Lucene has a class called IndexWriter that is used for indexing Lucene Documents. Here is a quick grep in Nutch's *java files:
$ ffjg -l IndexWriter ./src/test/org/apache/nutch/indexer/TestDeleteDuplicates.java ./src/java/org/apache/nutch/indexer/IndexMerger.java ./src/java/org/apache/nutch/indexer/IndexSorter.java ./src/java/org/apache/nutch/indexer/Indexer.java ./contrib/web2/plugins/web-query-propose-spellcheck/src/java/org/apache/nutch/spell/NGramSpeller.java Otis ----- Original Message ---- From: Stephane Gamard <[EMAIL PROTECTED]> To: [email protected] Sent: Wednesday, October 25, 2006 7:22:25 AM Subject: Nutch Indexing Hi all, I am a researcher in Semantic Analysis and I am very interested in the Nutch project as a test bed for new indexing methods. As I understand (and from the documentation online) Nutch allows for plugin development and manipulation. It looks promising then to be able to substitute my indexing method to the default Nutch one. Yet I would love some clarification regarding "which" plugin are responsible for the indexing of fetched web-pages, as they are the ones I shall be replacing. Does this make any sense, am I going the right way? Thank you, _Stephane
