Re: Making capitalization significant

Ype Kingma Wed, 30 Oct 2002 13:00:02 -0800

On Wednesday 30 October 2002 20:58, Michael McDonald wrote:
> Is there a way to arrange indexing and searching so that when searching
> for "Lucene", the term "Lucene" would be given more boost than the term
> "lucene", and ideally "lucene" would have more boost than "LUCENE"?


Use an analyzer that keeps the original case for indexing and query eg. like 
this:

Lucene^10 lucene^8 LUCENE^6

You want different weights per term, and you can't influence these directly
in the index. Therefore you'll have to query with different term weights.

A problem arises when there are 100 documents mentioning Lucene, and 
one document mentioning LUCENE. With the above query, the LUCENE document 
will likely get the highest score.

So you'll have to adapt the weights in the query by using the scoring formula
and correcting for  the nrs of documents containing each of the terms.
You can get these from IndexReader.docFreq().

And you'll have to do that for each casing of the queried term, ie.
2 ** (length of term) times, skipping the ones having zero frequency.

Kind regards,
Ype

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@;jakarta.apache.org>

Re: Making capitalization significant

Reply via email to