Re: DocumentWriter.writeNorms : the way to compute the normalisation factor

Christoph Goller Thu, 15 Apr 2004 06:36:17 -0700

Phil brunet wrote:

Hi to all.

In the DocumentWriter.writeNorms(Document doc, String segment) method (Lucene V1.3) i wonder if there is a special reason to compute the normalisation factor base upon the number of tokens contained in the document (using fieldLengths array) instead of computing it using the number of positions (filedPositions array).

I think in most of case, the difference is not significant.So using fieldLengths or using filedPositions are equivallent. But i would like to be sure of it.

So, if anybody has an opinion ...

Thanks

Phil
Nota bene:
=======
If i understood correctly, the fieldLength value and the fieldPosition value are different for a given document if and only if the document contains at least one token with an increment set to 0.

In my case, such a token should not be compted in the normalisation factor. cause i need this factor to be exactly in inverse proportion of the number OF DIFFERENT tokens (i.e. ignoring those with increment set to 0).


This issue was discussed a couple of weeks ago. It seems that some folks use
rather big position increments in order to identify sentence and paragraph
boundaries. Note that positions are currently used only by PhraseQueries and
we do not want a PhraseQuery to match in the gap between sentences and
paragraphs ..... However, this means that the number of positions and the
number of tokens may vary considerably.

Maybe you can solve your problem witrh the new IndexReader.setNorm.
Unfortunately, this means that you have to stop indexing, close your
writer, and open an IndexReader .....
Not very comfortable ....

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DocumentWriter.writeNorms : the way to compute the normalisation factor

Reply via email to