Hi Em,
1. Regarding the performances - the similarity class (And my subtype as well)
gets the IDF and TF and SQUARED SUMS calculations as inputs - they just factor
them differently. Even though I ignore the values they are being computed.
2. I have written this code:
static {
Similarity.setDefault(new MySimilarity());
}
Which means that I am setting the default similarity before doing the indexing
and obviously before the searching.
Thanks!
-----Original Message-----
From: Em [mailto:[email protected]]
Sent: Tuesday, February 21, 2012 6:07 PM
To: [email protected]
Subject: Re: Custom lucene scoring - Dot product between field boost and query
boost
Hi Yuval,
> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for
> nothing...
You aren't calculating that much, since you declared all those values as
constants. What are you worried about?
> 2. The score I get from the TopScoreDocCollector is not the same as I
get from the Explanation.
> Here is part of my code:
Could you provide us the code where you are setting the Similarity, please?
Kind regards,
Em
Am 21.02.2012 16:18, schrieb Yuval Kesten:
> Hi,
> I want to use Lucene with the following scoring logic:
> When I index my documents I want to set for each field a score/weight.
> When I query my index I want to set for each query term a score/weight.
>
> I will NEVER index or query with many instances of the same field - In each
> query (document) there will be 0-1 instances with the same field name.
> My fields/query term are not analyzed - they are already made out of one
> token.
>
> I want the score to be simply the dot product between the fields of the query
> to the fields of the document if they have the same value.
>
> For example:
> Query:
> Field Name
>
> Field Value
>
> Field Score
>
> 1
>
> AA
>
> 0.1
>
> 7
>
> BB
>
> 0.2
>
> 8
>
> CC
>
> 0.3
>
>
> Document 1:
> Field Name
>
> Field Value
>
> Field Score
>
> 1
>
> AA
>
> 0.2
>
> 2
>
> DD
>
> 0.8
>
> 7
>
> CC
>
> 0.999
>
> 10
>
> FFF
>
> 0.1
>
>
> Document 2:
> Field Name
>
> Field Value
>
> Field Score
>
> 7
>
> BB
>
> 0.3
>
> 8
>
> CC
>
> 0.5
>
>
> The scores should be:
> Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2 = 0.02
> Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q *
> FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5)
>
> What would be the best way implement it? In terms of accuracy and
> performances (I don't need TF and IDF calculations).
>
> I currently implemented it by setting boosts to the fields and query terms.
> Then I overwritten the DefaultSimilarity class:
>
> public class MySimilarity extends DefaultSimilarity {
>
> @Override
> public float computeNorm(String field, FieldInvertState state) {
> return state.getBoost();
> }
>
> @Override
> public float queryNorm(float sumOfSquaredWeights) {
> return 1;
> }
>
> @Override
> public float tf(float freq) {
> return 1;
> }
>
> @Override
> public float idf(int docFreq, int numDocs) {
> return 1;
> }
>
> @Override
> public float coord(int overlap, int maxOverlap) {
> return 1;
> }
>
> }
>
> And based on
> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html
> this should work.
> Problems:
> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for
> nothing...
> 2. The score I get from the TopScoreDocCollector is not the same as I get
> from the Explanation.
> Here is part of my code:
>
> indexSearcher = new IndexSearcher(IndexReader.open(directory, true));
> TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN,
> true); indexSearcher.search(query, collector); ScoreDoc[] hits =
> collector.topDocs().scoreDocs; for (int i = 0; i < hits.length; ++i) {
> int docId = hits[i].doc; Document d = indexSearcher.doc(docId); double
> score = hits[i].score; String id = d.get(FIELD_ID); Explanation
> explanation = indexSearcher.explain(query, docId); }
>
> Thanks!
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]