Re: Nutch and Lucene

Andrzej Bialecki Fri, 10 Nov 2006 02:07:36 -0800

hzhong wrote:

Hello,


This is what I want to do.  Given a document, find all its terms and

frequencies.

I understand that Nutch is built on top of Lucene.  In Lucene, I can access
the terms and their frequencies of a document via the indexreader.  However,
in nutch, I am not sure if there's an equivalent.  In Lucene, indexreader
needs to know where the inverted indexes are.  In Nutch, I am not sure how

and where to locate the inverted indexes.Is it possible to access the inverted index from Nutch?

What you need is named "term vector". Nutch doesn't support this out ofthe box, but it;s relatively easy to add. You would have to modifyorg.apache.nutch.searcher.Searcher and add a method to retrieveTermVector - and implement this method inorg.apache.nutch.searcher.IndexSearcher using Lucene classes.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Nutch and Lucene

Reply via email to