On May 24, 2006, at 2:12 PM, Chris Hostetter wrote:
: Adding a fieldName argument to similarity.tf(freq) would add
: significant overhead, since it gets called a *lot*.
...why would having the extra param add signifianct overhead? ...
or is
the point just that if someone wants customized tf based on field
name, it
would be better to make that choice once when starting to score a
query,
(since the choice is going to be the same for all docs) rather then
everytime tf is called becuase the way a user chooses *might*
involve a
lot of overhead?
I suppose that's true that the default wouldn't suffer much if at all
-- it'd just ignore the fieldName param.
public float tf(float freq, String fieldName) {
return tf(freq);
}
However, if you wanted to override that behavior, you'd have to apply
at least one conditional for each doc that the Scorer plows through.
public float tf(float freq, String fieldName) {
if ("title".equals(fieldName))
return 1.0f;
else
return tf(freq);
}
That's going to be less efficient than overriding that method in an
alternative Similarity instance for the field "title" and retrieving
it once. You never know how much until you benchmark it, of course.
Should a similar change be made to
IndexWriter, and replace Similarity.lengthNorm(String,int) with
lengthNorm(int) ?
I like it. <evilgrin> That's one step closer towards assigning each
Field a pluggable, comprehensive codec. </evilgrin>
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]