Thanks, I got it by doing something like this:

public class PartialSimilarity : DefaultSimilarity
    {
        public override float Idf(long docFreq, long docCount)
        {
            return 1.0f;
        }

        public override float Tf(float freq)
        {
            return 1.0f;
        }

        public override float LengthNorm(FieldInvertState state)
        {
            int numTerms;
            if (m_discountOverlaps)
            {
                numTerms = state.Length - state.NumOverlap;
            }
            else
            {
                numTerms = state.Length;
            }
            return (float)numTerms;
        }

        public override long ComputeNorm(FieldInvertState state)
        {
            float normValue = LengthNorm(state);
            return (long)normValue;
        }

        public override float QueryNorm(float sumOfSquaredWeights)
        {
            return 1.0f;
        }

        public override float DecodeNormValue(long norm)
        {
            return 1.0f / (float)norm;
        }

        public override float Coord(int overlap, int maxOverlap)
        {
            return 1.0f;
        }
    }

A slightly different variation of this is the following:
If it’s a partial match, how can I return a score of 0? i.e. if query is “A B 
C” and the field contains “B D”, then, I want to say that the score is 0. This 
requires knowledge of the sum of scores of all terms, which I am not sure how I 
can access.
My hunch is that I would need to create a specialized type of query, but it’s 
not clear to me what it needs to be. Any suggestions?
Best,
Georgios

From: Adrien Grand <jpou...@gmail.com>
Sent: Wednesday, May 22, 2024 12:20 AM
To: dev@lucene.apache.org
Subject: [EXTERNAL] Re: Question about extending Similarity

You don't often get email from jpou...@gmail.com<mailto:jpou...@gmail.com>. 
Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Hi Georgios,

This is possible. You need to create a similarity that stores the number of 
terms as a norm, and then produce scores that are equal to freq/norm at search 
time.

On Tue, May 21, 2024 at 8:02 PM Georgios Georgiadis 
<georgios.georgia...@microsoft.com.invalid<mailto:georgios.georgia...@microsoft.com.invalid>>
 wrote:
Hi,
I would like to extend Similarity to have the following functionality: if the 
query is “A B C” and a field contains “B C” then I would like to call that a 
“match” and return a score of 1 (2/2). If the query is “A B C” and the field 
contains “B D” then I would like to call that a partial match and give a score 
of 0.5 (1/2). Is this possible?
Best,
Georgios


--
Adrien

Reply via email to