Your similarity looks ok. > My hunch is that I would need to create a specialized type of query, but it’s not clear to me what it needs to be.
You are right, this requires a query. A similarity alone cannot do this. You could create a two-phase iterator that reads the norm field and returns false in matches() when the score doesn't match the length of the field. In case you want a more general form of this, note that you could look into Lucene's monitor module. Because what you are doing here consists of indexing conjunctive queries and trying to match sets of terms against them. On Wed, May 22, 2024 at 8:50 PM Georgios Georgiadis <georgios.georgia...@microsoft.com.invalid> wrote: > Thanks, I got it by doing something like this: > > > > public class PartialSimilarity : DefaultSimilarity > > { > > public override float Idf(long docFreq, long docCount) > > { > > return 1.0f; > > } > > > > public override float Tf(float freq) > > { > > return 1.0f; > > } > > > > public override float LengthNorm(FieldInvertState state) > > { > > int numTerms; > > if (m_discountOverlaps) > > { > > numTerms = state.Length - state.NumOverlap; > > } > > else > > { > > numTerms = state.Length; > > } > > return (float)numTerms; > > } > > > > public override long ComputeNorm(FieldInvertState state) > > { > > float normValue = LengthNorm(state); > > return (long)normValue; > > } > > > > public override float QueryNorm(float sumOfSquaredWeights) > > { > > return 1.0f; > > } > > > > public override float DecodeNormValue(long norm) > > { > > return 1.0f / (float)norm; > > } > > > > public override float Coord(int overlap, int maxOverlap) > > { > > return 1.0f; > > } > > } > > > > A slightly different variation of this is the following: > > If it’s a partial match, how can I return a score of 0? i.e. if query is > “A B C” and the field contains “B D”, then, I want to say that the score is > 0. This requires knowledge of the sum of scores of all terms, which I am > not sure how I can access. > > My hunch is that I would need to create a specialized type of query, but > it’s not clear to me what it needs to be. Any suggestions? > > Best, > > Georgios > > > > *From:* Adrien Grand <jpou...@gmail.com> > *Sent:* Wednesday, May 22, 2024 12:20 AM > *To:* dev@lucene.apache.org > *Subject:* [EXTERNAL] Re: Question about extending Similarity > > > > You don't often get email from jpou...@gmail.com. Learn why this is > important <https://aka.ms/LearnAboutSenderIdentification> > > Hi Georgios, > > > > This is possible. You need to create a similarity that stores the number > of terms as a norm, and then produce scores that are equal to freq/norm at > search time. > > > > On Tue, May 21, 2024 at 8:02 PM Georgios Georgiadis < > georgios.georgia...@microsoft.com.invalid> wrote: > > Hi, > > I would like to extend Similarity to have the following functionality: if > the query is “A B C” and a field contains “B C” then I would like to call > that a “match” and return a score of 1 (2/2). If the query is “A B C” and > the field contains “B D” then I would like to call that a partial match and > give a score of 0.5 (1/2). Is this possible? > > Best, > > Georgios > > > > > -- > > Adrien > -- Adrien