RE: Custom lucene scoring - Dot product between field boost and query boost

Yuval Kesten Tue, 21 Feb 2012 23:18:43 -0800

Hi Em,
1. Regarding the performances - the similarity class (And my subtype as well) 
gets the IDF and TF and SQUARED SUMS calculations as inputs - they just factor 
them differently. Even though I ignore the values they are being computed.
2. I have written this code:
    static {
        Similarity.setDefault(new MySimilarity());
    }
Which means that I am setting the default similarity before doing the indexing 
and obviously before the searching.
Thanks!


-----Original Message-----
From: Em [mailto:[email protected]] 
Sent: Tuesday, February 21, 2012 6:07 PM
To: [email protected]
Subject: Re: Custom lucene scoring - Dot product between field boost and query 
boost

Hi Yuval,

> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for 
> nothing...
You aren't calculating that much, since you declared all those values as 
constants. What are you worried about?

> 2. The score I get from the TopScoreDocCollector is not the same as I
get from the Explanation.
> Here is part of my code:
Could you provide us the code where you are setting the Similarity, please?

Kind regards,
Em

Am 21.02.2012 16:18, schrieb Yuval Kesten:
> Hi,
> I want to use Lucene with the following scoring logic:
> When I index my documents I want to set for each field a score/weight.
> When I query my index I want to set for each query term a score/weight.
> 
> I will NEVER index or query with many instances of the same field - In each 
> query (document) there will be 0-1 instances with the same field name.
> My fields/query term are not analyzed - they are already made out of one 
> token.
> 
> I want the score to be simply the dot product between the fields of the query 
> to the fields of the document if they have the same value.
> 
> For example:
> Query:
> Field Name
> 
> Field Value
> 
> Field Score
> 
> 1
> 
> AA
> 
> 0.1
> 
> 7
> 
> BB
> 
> 0.2
> 
> 8
> 
> CC
> 
> 0.3
> 
> 
> Document 1:
> Field Name
> 
> Field Value
> 
> Field Score
> 
> 1
> 
> AA
> 
> 0.2
> 
> 2
> 
> DD
> 
> 0.8
> 
> 7
> 
> CC
> 
> 0.999
> 
> 10
> 
> FFF
> 
> 0.1
> 
> 
> Document 2:
> Field Name
> 
> Field Value
> 
> Field Score
> 
> 7
> 
> BB
> 
> 0.3
> 
> 8
> 
> CC
> 
> 0.5
> 
> 
> The scores should be:
> Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2  = 0.02
> Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q * 
> FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5)
> 
> What would be the best way implement it? In terms of accuracy and 
> performances (I don't need TF and IDF calculations).
> 
> I currently implemented it by setting boosts to the fields and query terms.
> Then I overwritten the DefaultSimilarity class:
> 
> public class MySimilarity extends DefaultSimilarity {
> 
>     @Override
>     public float computeNorm(String field, FieldInvertState state) {
>         return state.getBoost();
>     }
> 
>     @Override
>     public float queryNorm(float sumOfSquaredWeights) {
>         return 1;
>     }
> 
>     @Override
>     public float tf(float freq) {
>         return 1;
>     }
> 
>     @Override
>     public float idf(int docFreq, int numDocs) {
>         return 1;
>     }
> 
>     @Override
>     public float coord(int overlap, int maxOverlap) {
>         return 1;
>     }
> 
> }
> 
> And based on 
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html 
> this should work.
> Problems:
> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for 
> nothing...
> 2. The score I get from the TopScoreDocCollector is not the same as I get 
> from the Explanation.
> Here is part of my code:
> 
> indexSearcher = new IndexSearcher(IndexReader.open(directory, true)); 
> TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN, 
> true); indexSearcher.search(query, collector); ScoreDoc[] hits = 
> collector.topDocs().scoreDocs; for (int i = 0; i < hits.length; ++i) { 
> int docId = hits[i].doc; Document d = indexSearcher.doc(docId); double 
> score = hits[i].score; String id = d.get(FIELD_ID); Explanation 
> explanation = indexSearcher.explain(query, docId); }
> 
> Thanks!
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Custom lucene scoring - Dot product between field boost and query boost

Reply via email to