Hi Paul, Thanks for your detailed reply! It really helped alot. However, I am experiancing some conflicts.
For one of the documents in result set, when i use IndexReader fir=FilterIndexReader.open("index"); byte[] fNorm=fir.norm("Body"); System.out.println("FNorm: "+ fNorm[306]); Document d=fir.document(306); Field f=d.getField("Body"); System.out.println("Body: "+ f.stringValue()); This gives me out fNorm 113, whereas total number of term (including stop-words) are 42 in this particular field of selected document. In the explanation , fieldNorm (field=Body, doc=306) is 0.1562, which is approx 41 term words for that field in that documents. So explanation values makes sense with real data, while including all stop words like to,it, the & etc. So, my question is, > Am i getting the norm values from right place? > Is there any way to find out number of indexed terms for each document? Please advise! Thanks, Zia On Wed, 2004-09-29 at 08:17, Paul Elschot wrote: > Zia, > > On Tuesday 28 September 2004 21:22, you wrote: > > Hi, > > > > I'm trying to learn the Scoring mechanism of Lucene. I want to fetch > > each parameter value individually as they are collectively dumped out by > > Explanation. I've managed to pull out TF and IDF values using > > DefaultSimilarity and FilterIndexReader, but not sure from where to get > > the fieldNorm and queryNorm from. > > The norms are here: > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#norms(java.lang.String) > The resulting array is indexed by the document number for the IndexReader. > With the default similarity, each norm is the inverse square root of the number of > indexed terms in the > document field. However, there are only 8 bits available to encode this value, so > it's quite rough. > > The default queryNorm is here: > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/DefaultSimilarity.html#queryNorm(float) > There is an explanation of the scoring in the javadocs of Similarity. > There has been some discussion on an idf factor that was missing from this > documentation, > I don't know whether the docs have been adapted for this. > > > Also is there any reference about how normalisation has been > > implemented? > > See above, DefaultSimilarity is the default implementation of the Similarity > interface. > queryNorm() takes a sumOfSquaredWeights, where the weights are the term weights > from the query. It returns the square root. > > It may be that the sum of squared weights should be a sum of square rooted weights > and that queryNorm should return a square then. > I posted this on lucene-user on 20 September: > http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=10023 > > It's only a normalisation, so it doesn't affect the order of the search results much. > Taking the square roots of the query term weights would have > the query weights directly apllied to the the query term density in the document > field, > whereas now the weights seem to be applied to the square root of the density. > The density value is an approximation, see above for the rough field norms. > > Regards, > Paul Elschot > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] -- Zia Syed <[EMAIL PROTECTED]> Smartweb Research Center, Robert Gordon University --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]