Cool, so just to be sure, if I disable the coord factor I can finally compare my BooleanQuery results ?
On 28 March 2011 10:11, Uwe Schindler <u...@thetaphi.de> wrote: > Hi Patrick, > > You can disable the coord factor in the constructor of BooleanQuery. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > > Sent: Monday, March 28, 2011 10:09 AM > > To: java-user@lucene.apache.org > > Subject: Re: comparing lucene scores across queries > > > > Hi, thanks for reply. > > > > Yeah, I've read the Similarity class documentation several times, but I > need > > some tip. > > > > My queries are BooleanQueries but they always have the same structure > > (the same structure of the docs, they are actually docs from collection): > 3 > > fields. > > > > What if I simplify the similarity scores, by removing coord factor and > just > > leaving the cosine similarity which is comparable ? > > > > I want to underline the fact that my boolean queries are just a > combination > > of "field:term" items, and I always have the same 3 fields with different > > terms obviously. > > > > Thanks > > > > > > > > > > On 28 March 2011 10:03, Uwe Schindler <u...@thetaphi.de> wrote: > > > > > No, scores are in general not comparable between different queries. > > > The problem lies in many things: > > > - Each query has a norm factor that makes it more compareable if they > > > are sub clauses of a BooleanQuery. But you are right, this norm factor > > > should be the same. > > > - Some queries like FuzzyQuery rely on the terms in index and those > > > matches the query > > > - Inside Boolean queries, there is also a coord-factor involved > > > > > > If you are always using the same simple type of query (e.g. simple > > > TermQuery, only with different term) on the same index, you can > > > compare the scores. As soon as you are using complex queries (e.g > > > several terms compared in a BooleanQuery as QueryParser produces), the > > > scores are no longer comparable. > > > > > > You can read more on all factors that are included in scoring: > > > > > > > > http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/ > > > Simila > > > rity.html > > > > > > ----- > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > > > > -----Original Message----- > > > > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com] > > > > Sent: Monday, March 28, 2011 9:44 AM > > > > To: java-user@lucene.apache.org > > > > Subject: comparing lucene scores across queries > > > > > > > > Hi, > > > > > > > > sorry I've already asked few days ago, but I got no reply and I > > > > really > > > need > > > > some help on this.. > > > > > > > > I'm running several queries against a doc collection. The queries > > > > are documents of the collection itself, I need to measure how > > > > similar is each document to the rest of the collection. > > > > > > > > Now, Lucene returns me a score per query, but I've been told such > > > > score > > > is > > > > not comparable across queries. Is this correct ? > > > > > > > > For example, arem't these scores comparable ? > > > > query1, score:8.324234 > > > > query2, score:3.324238 > > > > > > > > If so, why not ? Isn't the cosine similarity between the query > > > > vector and collection docs vectors ? I really need a comparable > measure. > > > > > > > > thanks > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >