Hi Paul, thanks a lot for the reply. Is there any published writings (paper or journal) on Lucene Scoring. Did the Lucene developers/designers refer to any paper or tested methods or did they propose new scoring formula that is specific to Lucene only? I am asking this because I need to put this reference on the report I am currently writing. regards maggy
________________________________ From: Paul Elschot [mailto:[EMAIL PROTECTED] Sent: Tue 7/3/2007 2:56 PM To: [email protected] Subject: Re: clarification on booleanScorer Maggy, On [EMAIL PROTECTED] there is normally a higher chance of getting a response. You may have missed this: http://lucene.apache.org/java/docs/scoring.html Your analysis below is correct, only a few points need to be added: - the coordination factor, which favours more matching clauses (for prefix queries normally no coordination is used), - your examples are nested boolean queries, so all this applies on each level, and - the idf computation is a bit more involved, see the reference above. Regards, Paul Elschot On Tuesday 03 July 2007 01:38, #MAGGY ANASTASIA SURYANTO# wrote: > Hi all, > > I would like to clarify my understanding of the way Lucene score boolean queries, in relation with +/ clause attributes (required and optional) as well as OR and AND operators. > > After looking at the BooleanScorer source core, the following is my understanding on the scoring: > 1. OR is translated into " " (optional) and AND is translated into "+" (required) by queryParser > so, is it true that > (t1 t2 t3) AND (t4 t5 t6) OR (t7 t8 t9) is parsed by queryParser into the following boolan query > +(t1 t2 t3) +(t4 t5 t6) (t7 t8 t9) > > 2. using default similarity, a score of a document score(q,d) is the summation of the tf, idf measure of the terms in q that appear in d. > > 3. Score of a document w.r.t BooleanClause, BC (score(BC,d)) is the sum of score of the document w.r.t sall sub clauses of BC. > > 4. no difference in treating "+" clauses and " " clauses in scoring (i.e. their scorer.score() are summed up together to produce the total score of their parent' score), however, the addition of the scores of " " clauses are delayed until all "+" are matched by the documents. If not all "+" mare matched, the document is not retrieved. > -----C1----- ----C2----- ----C3----- ------C4------- > q = +{+(t1 t2 t3) +(t4 t5 t6) (t7 t8 t9)} {t10 t11 t12} > > assuming a document,d match C1 and C2, the s(q,d) = sum(sum(s(C1,d) + s(C2,d) + s(C3, d)), s(C4,d)) > > Please let me know whether the above are true. In case there are something I miss to understand the scoring of booleanScorer, please let me know. > > > best regards > > maggy > > > >
