You will want to take a look at the index-basic and index-more plugins
as well as the org.apache.nutch.indexer.Indexer class. If you just want
to score documents differently as opposed to indexing them differently
you will want to take a look at the scoring-opic plugin for an
implementatiof of the scoring algorithm.
Dennis Kubes
Otis Gospodnetic wrote:
Stephane,
Nutch uses Lucene for indexing, and Lucene has a class called IndexWriter that
is used for indexing Lucene Documents. Here is a quick grep in Nutch's *java
files:
$ ffjg -l IndexWriter
./src/test/org/apache/nutch/indexer/TestDeleteDuplicates.java
./src/java/org/apache/nutch/indexer/IndexMerger.java
./src/java/org/apache/nutch/indexer/IndexSorter.java
./src/java/org/apache/nutch/indexer/Indexer.java
./contrib/web2/plugins/web-query-propose-spellcheck/src/java/org/apache/nutch/spell/NGramSpeller.java
Otis
----- Original Message ----
From: Stephane Gamard <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, October 25, 2006 7:22:25 AM
Subject: Nutch Indexing
Hi all,
I am a researcher in Semantic Analysis and I am very interested in
the Nutch project as a test bed for new indexing methods. As I
understand (and from the documentation online) Nutch allows for
plugin development and manipulation. It looks promising then to be
able to substitute my indexing method to the default Nutch one. Yet I
would love some clarification regarding "which" plugin are
responsible for the indexing of fetched web-pages, as they are the
ones I shall be replacing.
Does this make any sense, am I going the right way?
Thank you,
_Stephane