Re: sumOfSquaredWeights for lengthNorm

Eugene Mon, 06 Mar 2006 21:07:32 -0800

Hi,

My comments in-line.


Chris Hostetter wrote:

: I would like to override the Similarity class lengthNorm(String
: fieldName, int numTerms) so that it behaves similar to queryNorm(float
: sumOfSquaredWeights).  So the method signature becomes lengthNorm(String
: fieldName, float sumOfSquaredWeights) where sumOfSquaredWeights = sum of
: the squares of doc term weights.
:
: Looking at the way sumOfSquaredWeights was used in
: org.apache.lucene.search.Query weight method, I would like to have a
: weight method in org.apache.lucene.document.Field (or may be in
: org.apache.lucene.document.Document) which returns the weight based on
: the terms in the Field. Can anyone tell me how to start?

can you explain more what you mean by "doc term weights" ?

It seems like what you are interested in doing is changing the way norm
value of a doc/field is determined so that it's determined not just by the
number of terms in the field, but also by the "weight" or some terms --
i'm not sure if you mean the terms being queried on, or the terms stored
in the field for the document

Yes, you got the idea, i mean the terms in the field. I think termweights of the query are already factored in in queryNorm. I want tonormalize based on the field's terms' weights too.

Two concepts that already exist (and may be useful to you) are:

1) the "boosts" associated with Fields and Documents at indexing time,
which are combined with the lengthNorm at index time to determine a single
"norm"  value for the doc/field pair.

I don;t think this is what I want because the lengthNorm is still usingthe # of terms.

2) the idf of the terms being queried on, which is multiplied by the field
norm as part of the query time scoring (you can see it in the
fieldWeight in a score Explanation)

Yes, I noticed this, but this is not what I want because its using "idfof the terms being queried". What I want fieldWeight to be is to use the1/ sqrt(sumOfSquaredWeights), where sumOfSquaredWeights = tf^2 overall terms in the field.

3) I got another issue with the explanation, which seems to be a bug.Below, I;ve given a printout of the explanation. There's somethingstrange when I use my own Similarity it prints out all query termsdespite some them not appearing in the doc (See for "formulation" thedocFreq = 0 but it appears in the explanation).

Also the scores don;t tally. I printed out the raw score for doc 21using the HitCollector and it returns 1.4241531. I printout explanationthe score is 2.731636. Shouldn't this be the same since both aren'tnormalized scores?


------  Explanation --------
doc id:21      score = 1.4241531

Explanation = 2.731636 = sum of:
......
  0.30496213 = weight(Contents:formulation in 21), product of:
    0.40874794 = queryWeight(Contents:formulation), product of:
      5.9687076 = idf(docFreq=0)
      0.06848182 = queryNorm
    0.74608845 = fieldWeight(Contents:formulation in 21), product of:
      1.0 = tf(termFreq(Contents:formulation)=0)
      5.9687076 = idf(docFreq=0)
      0.125 = fieldNorm(field=Contents, doc=21)
......
------ End of Explanation --------


Thanks.

--
Eugene

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: sumOfSquaredWeights for lengthNorm

Reply via email to