Stephane,

Nutch uses Lucene for indexing, and Lucene has a class called IndexWriter that 
is used for indexing Lucene Documents.  Here is a quick grep in Nutch's *java 
files:

$ ffjg -l IndexWriter
./src/test/org/apache/nutch/indexer/TestDeleteDuplicates.java
./src/java/org/apache/nutch/indexer/IndexMerger.java
./src/java/org/apache/nutch/indexer/IndexSorter.java
./src/java/org/apache/nutch/indexer/Indexer.java
./contrib/web2/plugins/web-query-propose-spellcheck/src/java/org/apache/nutch/spell/NGramSpeller.java

Otis


----- Original Message ----
From: Stephane Gamard <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, October 25, 2006 7:22:25 AM
Subject: Nutch Indexing

Hi all,

    I am a researcher in Semantic Analysis and I am very interested in  
the Nutch project as a test bed for new indexing methods. As I  
understand (and from the documentation online) Nutch allows for  
plugin development and manipulation. It looks promising then to be  
able to substitute my indexing method to the default Nutch one. Yet I  
would love some clarification regarding "which" plugin are  
responsible for the indexing of fetched web-pages, as they are the  
ones I shall be replacing.

    Does this make any sense, am I going the right way?

Thank you,

_Stephane



Reply via email to