Re: Suggestions for documentation or LIA

Erik Hatcher Wed, 26 Jan 2005 12:26:33 -0800

On Jan 26, 2005, at 10:25 AM, Ian Soboroff wrote:

Erik Hatcher <[EMAIL PROTECTED]> writes:

By all means, if you have other suggestions for our site, let us know
at [EMAIL PROTECTED]


One of the things I would like to see, but which isn't either in the
Lucene site, documentation, or "Lucene in Action", is a complete
description of how the retrieval algorithm works.  That is, how the
HitCollector, Scorers, Similarity, etc all fit together.

I'm involved in a project which to some degree is looking at poking
deeply into this part of the Lucene code.  We have a nice (non-Lucene)
framework for working with more different kinds of similarity
functions (beyond tf-idf) which should also be expandable to include
query expansion, relevance feedback, and the like.

I used to think that integrating it would be as simple as hacking in
Similarity, but I'm beginning to think it might need broader changes.
I could obviously hook in our whole retrieval setup by just diving for
an IndexReader and doing it all by hand, but then I would have to redo
the incremental search and possibly the rich query structure, which
would be a lose.

So anyway, I got LIA hoping for a good explanation (not a good
Explanation) on this bit, but it wasn't there.

Hacking Similarity wasn't covered in LIA for one simple reason - Lucene's built-in scoring mechanism really is good enough for almost all projects. The book was written for developers of those projects.

Personally, I've not had to hack Similarity, though I've toyed with it in prototypes and am using a minor tweak (turning off length normalization for the "title" field) for the lucenebook.com book indexing.

  There are some hints
on the Lucene site, but nothing complete.  If I muddle it out before
anything gets contributed, I'll try to write something up, but don't
expect anything too soon...


And maybe you'd contribute what you write to LIA 2nd edition :)

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Suggestions for documentation or LIA

Reply via email to