Doug Cutting wrote:

Yes, this is why I was discouraged and stopped working on this.

However I am now hopeful that sorting the entire index by page score and using top-1000 might work well with Nutch queries, since page score is field-independent, and I think fields cause the problems. Plus, this would be a lot simpler than the cross-field summing described above.

I can start writing an index-sorter today, unless you are already working on this. If you have an evaluation framework, that would be great.


By all means please start, this is still near the limits of my knowledge of Lucene... ;-)

My testing framework consists of a bunch of Beanshell scripts, and a test index that I know of (which I'm not at liberty to share). But I can prepare another index, based e.g. on the Reuters corpus, and clean up the scripts somewhat.

I'm interested in following this up and contributing to a usable conclusion.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to