Re: IndexOptimizer (Re: Lucene performance bottlenecks)

Andrzej Bialecki Mon, 12 Dec 2005 09:51:16 -0800

Doug Cutting wrote:

Yes, this is why I was discouraged and stopped working on this.
However I am now hopeful that sorting the entire index by page scoreand using top-1000 might work well with Nutch queries, since pagescore is field-independent, and I think fields cause the problems.Plus, this would be a lot simpler than the cross-field summingdescribed above.
I can start writing an index-sorter today, unless you are alreadyworking on this. If you have an evaluation framework, that would begreat.

By all means please start, this is still near the limits of my knowledgeof Lucene... ;-)

My testing framework consists of a bunch of Beanshell scripts, and atest index that I know of (which I'm not at liberty to share). But I canprepare another index, based e.g. on the Reuters corpus, and clean upthe scripts somewhat.


I'm interested in following this up and contributing to a usable conclusion.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: IndexOptimizer (Re: Lucene performance bottlenecks)

Reply via email to