What replaces the computeNorm method in DefaultSimilarity in 4.1
Ive always subclassed DefaultSimilarity to resolve an issue whereby when
document has multiple values in a field (because has one-many
relationship) its score worse then a document which just has single
value but the computeNorm()
method has gone, but when I tried to rewrite the method for 4.1 as follows
public void computeNorm(FieldInvertState state, Norm norm) {
if (state.getName().equals("alias")) {
if(state.getLength()>=3) {
norm.setFloat(state.getBoost() * 0.578f);
}
else {
super.computeNorm(state, norm);
}
}
else {
super.computeNorm(state, norm);
}
}
I found it was final so what should I do.
3.6 Code:
package org.musicbrainz.search.analysis;
import org.apache.lucene.index.FieldInvertState;
import org.apache.lucene.search.similarities.DefaultSimilarity;
/**
* Calculates a score for a match, overridden to deal with problems
with alias fields in artist and label indexes
*/
public class MusicbrainzSimilarity extends DefaultSimilarity
{
/**
* Calculates a value which is inversely proportional to the number
of terms in the field. When multiple
* aliases are added to an artist (or label) it is seen as one
field, so artists with many aliases can be
* disadvantaged against when the matching alias is radically
different to other aliases.
*
* @return score component
*/
public float computeNorm(String field, FieldInvertState state) {
//This will match both artist and label aliases and is
applicable to both, didn't use the constant
//ArtistIndexField.ALIAS because that would be confusing
if (field.equals("alias")) {
if(state.getLength()>=3)
{
return state.getBoost() * 0.578f; //Same result as
normal calc if field had three terms the most common scenario
}
else {
return super.computeNorm(field,state);
}
}
else
{
return super.computeNorm(field,state);
}
}
/**
* This method calculates a value based on how many times the
search term was found in the field. Because
* we have only short fields the only real case (apart from rare
exceptions like Duran Duran Duran) whereby
* the term term is found more than twice would be when
* a search term matches multiples aliases, to remove the bias this
gives towards artists/labels with
* many aliases we limit the value to what would be returned for a
two term match.
*
* Note: would prefer to do this just for alias field, but the
field is not passed as a parameter.
* @param freq
* @return score component
*/
@Override
public float tf(float freq) {
if (freq > 2.0f) {
return 1.41f; //Same result as if matched term twice
} else {
return super.tf(freq);
}
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org