Hi Yuval,
You can just override Similarity, rather than DefaultSimilarity - that way you
don't burn any CPU cycles on TF/IDF calculations.
Alan
On 22 Feb 2012, at 07:17, Yuval Kesten wrote:
> Hi Em,
> 1. Regarding the performances - the similarity class (And my subtype as well)
> gets the IDF and TF and SQUARED SUMS calculations as inputs - they just
> factor them differently. Even though I ignore the values they are being
> computed.
> 2. I have written this code:
> static {
> Similarity.setDefault(new MySimilarity());
> }
> Which means that I am setting the default similarity before doing the
> indexing and obviously before the searching.
> Thanks!
>
> -----Original Message-----
> From: Em [mailto:[email protected]]
> Sent: Tuesday, February 21, 2012 6:07 PM
> To: [email protected]
> Subject: Re: Custom lucene scoring - Dot product between field boost and
> query boost
>
> Hi Yuval,
>
>> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for
>> nothing...
> You aren't calculating that much, since you declared all those values as
> constants. What are you worried about?
>
>> 2. The score I get from the TopScoreDocCollector is not the same as I
> get from the Explanation.
>> Here is part of my code:
> Could you provide us the code where you are setting the Similarity, please?
>
> Kind regards,
> Em
>
> Am 21.02.2012 16:18, schrieb Yuval Kesten:
>> Hi,
>> I want to use Lucene with the following scoring logic:
>> When I index my documents I want to set for each field a score/weight.
>> When I query my index I want to set for each query term a score/weight.
>>
>> I will NEVER index or query with many instances of the same field - In each
>> query (document) there will be 0-1 instances with the same field name.
>> My fields/query term are not analyzed - they are already made out of one
>> token.
>>
>> I want the score to be simply the dot product between the fields of the
>> query to the fields of the document if they have the same value.
>>
>> For example:
>> Query:
>> Field Name
>>
>> Field Value
>>
>> Field Score
>>
>> 1
>>
>> AA
>>
>> 0.1
>>
>> 7
>>
>> BB
>>
>> 0.2
>>
>> 8
>>
>> CC
>>
>> 0.3
>>
>>
>> Document 1:
>> Field Name
>>
>> Field Value
>>
>> Field Score
>>
>> 1
>>
>> AA
>>
>> 0.2
>>
>> 2
>>
>> DD
>>
>> 0.8
>>
>> 7
>>
>> CC
>>
>> 0.999
>>
>> 10
>>
>> FFF
>>
>> 0.1
>>
>>
>> Document 2:
>> Field Name
>>
>> Field Value
>>
>> Field Score
>>
>> 7
>>
>> BB
>>
>> 0.3
>>
>> 8
>>
>> CC
>>
>> 0.5
>>
>>
>> The scores should be:
>> Score(q,d1) = FIELD_1_SCORE_Q * FILED_1_SCORE_D1 = 0.1 * 0.2 = 0.02
>> Score(q,d2) = FIELD_7_SCORE_Q * FILED_7_SCORE_D2 + FIELD_8_SCORE_Q *
>> FILED_8_SCORE_D2 = (0.2 * 0.3) + (0.3 * 0.5)
>>
>> What would be the best way implement it? In terms of accuracy and
>> performances (I don't need TF and IDF calculations).
>>
>> I currently implemented it by setting boosts to the fields and query terms.
>> Then I overwritten the DefaultSimilarity class:
>>
>> public class MySimilarity extends DefaultSimilarity {
>>
>> @Override
>> public float computeNorm(String field, FieldInvertState state) {
>> return state.getBoost();
>> }
>>
>> @Override
>> public float queryNorm(float sumOfSquaredWeights) {
>> return 1;
>> }
>>
>> @Override
>> public float tf(float freq) {
>> return 1;
>> }
>>
>> @Override
>> public float idf(int docFreq, int numDocs) {
>> return 1;
>> }
>>
>> @Override
>> public float coord(int overlap, int maxOverlap) {
>> return 1;
>> }
>>
>> }
>>
>> And based on
>> http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/scoring.html
>> this should work.
>> Problems:
>> 1. Performances: I am calculating all the TF/IDF stuff and NORMS for
>> nothing...
>> 2. The score I get from the TopScoreDocCollector is not the same as I get
>> from the Explanation.
>> Here is part of my code:
>>
>> indexSearcher = new IndexSearcher(IndexReader.open(directory, true));
>> TopScoreDocCollector collector = TopScoreDocCollector.create(iTopN,
>> true); indexSearcher.search(query, collector); ScoreDoc[] hits =
>> collector.topDocs().scoreDocs; for (int i = 0; i < hits.length; ++i) {
>> int docId = hits[i].doc; Document d = indexSearcher.doc(docId); double
>> score = hits[i].score; String id = d.get(FIELD_ID); Explanation
>> explanation = indexSearcher.explain(query, docId); }
>>
>> Thanks!
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]