We have a custom "tagger" application which identifies certain entities (such 
as companies, etc.) and applies a "relevance" value to each entity based upon 
overall relevance in some document.

Then we index these "tags" into Lucene index by storing them in an indexed 
field (same name, different values), for example "company=A, company=B, 
company=C",etc.

I know how to set the boost on each field according the relevance value from 
our tagging application.  However, sorting does not seem to work properly, 
since according to documentation all boost values per document under fields of 
the same name are actually combined by multiplying together:

>From http://lucene.apache.org/java/docs/scoring.html:

"For each field of a document, all boosts of that field (i.e. all boosts under 
the same field name in that doc) are multiplied."

So if I have two document, each with some entities:

Doc 1: A (100%), B (50%), C (25%)
Doc2: A(75%), D (50%)

Then query for A should return Doc1 ahead of Doc2.  But seems like what happens 
is this:

Doc1 boost = 1.0 * 0.5 * 0.25 = 0.125
Doc2 boost = 0.75 * 0.50 = 0.375

Therefore query for A returns Doc2 ahead of Doc1.

Is there a way around this (besides creating a different field name for each 
tag)?  Can I create custom similarity or scoring classes to handle this at 
query time somehow?

Thanks,
Bob

Reply via email to