Doug Cutting wrote:
Yes, this is why I was discouraged and stopped working on this.
However I am now hopeful that sorting the entire index by page score
and using top-1000 might work well with Nutch queries, since page
score is field-independent, and I think fields cause the problems.
Plus, this would be a lot simpler than the cross-field summing
described above.
I can start writing an index-sorter today, unless you are already
working on this. If you have an evaluation framework, that would be
great.
By all means please start, this is still near the limits of my knowledge
of Lucene... ;-)
My testing framework consists of a bunch of Beanshell scripts, and a
test index that I know of (which I'm not at liberty to share). But I can
prepare another index, based e.g. on the Reuters corpus, and clean up
the scripts somewhat.
I'm interested in following this up and contributing to a usable conclusion.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com