Hi to all.
In the DocumentWriter.writeNorms(Document doc, String segment) method (Lucene V1.3)
i wonder if there is a special reason to compute the normalisation factor base upon the number of tokens contained in the document (using fieldLengths array) instead of computing it using the number of positions (filedPositions array).
I think in most of case, the difference is not significant.So using fieldLengths or using filedPositions are equivallent. But i would like to be sure of it.
So, if anybody has an opinion ...
Thanks
Phil
Nota bene: =======
If i understood correctly, the fieldLength value and the fieldPosition value are different for a given document if and only if the document contains at least one token with an increment set to 0.
In my case, such a token should not be compted in the normalisation factor. cause i need this factor to be exactly in inverse proportion of the number OF DIFFERENT tokens (i.e. ignoring those with increment set to 0).
This issue was discussed a couple of weeks ago. It seems that some folks use rather big position increments in order to identify sentence and paragraph boundaries. Note that positions are currently used only by PhraseQueries and we do not want a PhraseQuery to match in the gap between sentences and paragraphs ..... However, this means that the number of positions and the number of tokens may vary considerably.
Maybe you can solve your problem witrh the new IndexReader.setNorm. Unfortunately, this means that you have to stop indexing, close your writer, and open an IndexReader ..... Not very comfortable ....
Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]