Dear Lucene Users, We are using Lucene 1.4 RC2, and are experiencing curious results that we think are related to the coordination term. Apparently the default implementation for coordination is:
(# query terms matched in a document)/(total terms in query) That seems to imply that given the queries "A v B v C v D", a disjunction of 4 terms, and "A ^ B ^ C ^ D", a conjunction of four terms that the a document containing only A would have 1/4 for a coordination score regardless. We understand the semantics for coordination where the conjunction of terms is involved, but for our purposes, we would want coordination for the disjunction to behave differently. Take for example these two queries (1) "A ^ B ^ C", a conjunction of 3 terms, and (2) "(A v A1 v A2 v A3) ^ (B v B1) ^ (C v C1 v C2)", a conjunction of 3 disjunctions, each of which contains related terms. We would like to see a document containing A, B and C have the same coordination score regardless of which query we were using. To us, it makes sense to model the disjunction "A" as being a single term that matches no matter which of version of A1..A4 appears in the document. The results we are seeing show documents we are interested in (say, ones that contain A, B and C) taking a rank penalty when we use query (2) rather than query (1). We suspect the coordination term in driving down these documents' ranks and we would like to bring those documents back up to where they should be. Is there a relatively easy way to implement what we want using Lucene? Would it be better to try to supply a Similarity class with a special-purpose coord method, or would it be better to try subclass Term to create some kind of term "glob" that would match any of a number of strings (a disjunction). Any advice you can give us would be greatly appreciated! Thanks in advance! Best regards, Matthew PS: Is it now possible by any chance to merge documents retrieved by two Lucene queries by score, owing perhaps to the queryNorm factor? Just curious. -- matthew w. bilotti computer science and artificial intelligence laboratory massachusetts institute of technology --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
