Help with scoring, coordination factor?

Matthew W. Bilotti Thu, 29 Apr 2004 11:10:23 -0700

Dear Lucene Users,

We are using Lucene 1.4 RC2, and are experiencing curious results that we 
think are related to the coordination term.  Apparently the default 
implementation for coordination is:


(# query terms matched in a document)/(total terms in query)

That seems to imply that given the queries "A v B v C v D", a disjunction
of 4 terms, and "A ^ B ^ C ^ D", a conjunction of four terms that the a
document containing only A would have 1/4 for a coordination score
regardless.

We understand the semantics for coordination where the conjunction of
terms is involved, but for our purposes, we would want coordination for
the disjunction to behave differently.  Take for example these two queries
(1) "A ^ B ^ C", a conjunction of 3 terms, and (2)  "(A v A1 v A2 v A3) ^
(B v B1) ^ (C v C1 v C2)", a conjunction of 3 disjunctions, each of which
contains related terms.

We would like to see a document containing A, B and C have the same 
coordination score regardless of which query we were using.  To us, it 
makes sense to model the disjunction "A" as being a single term that 
matches no matter which of version of A1..A4 appears in the document.

The results we are seeing show documents we are interested in (say, ones
that contain A, B and C) taking a rank penalty when we use query (2) 
rather than query (1).  We suspect the coordination term in driving down 
these documents' ranks and we would like to bring those documents back up 
to where they should be. 

Is there a relatively easy way to implement what we want using Lucene?  
Would it be better to try to supply a Similarity class with a
special-purpose coord method, or would it be better to try subclass Term
to create some kind of term "glob" that would match any of a number of
strings (a disjunction).

Any advice you can give us would be greatly appreciated!  Thanks in 
advance!

Best regards,
Matthew

PS: Is it now possible by any chance to merge documents retrieved by two 
Lucene queries by score, owing perhaps to the queryNorm factor?  Just 
curious.

-- 
matthew w. bilotti
computer science and artificial intelligence laboratory
massachusetts institute of technology





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Help with scoring, coordination factor?

Reply via email to