Thanks for your reply. But I still have some doubts. >From your answer, I think you mean to say that the document length is just saved in byte format for less memory consumption. But while debugging, I found that the doc length, that is passed in score() is 2621.44 where the actual doc length is 2355.
I am confused. Please help. On Fri, Jul 22, 2016 at 1:46 PM, Ahmet Arslan <iori...@yahoo.com> wrote: > Hi Roy, > > It is about storing the document length into a byte (to use less memory). > Please edit the source code to avoid this encode/decode thing: > > /** > * Encodes the document length in a lossless way > */ > @Override > public long computeNorm(FieldInvertState state) { > return state.getLength() - state.getNumOverlap(); > } > > @Override > public float score(int doc, float freq) { > // We have to supply something in case norms are omitted > return ModelBase.this.score(stats, freq, > norms == null ? 1L : norms.get(doc)); > } > > @Override > public Explanation explain(int doc, Explanation freq) { > return ModelBase.this.explain(stats, doc, freq, > norms == null ? 1L : norms.get(doc)); > } > > > > On Thursday, July 21, 2016 6:06 PM, Dwaipayan Roy <dwaipayan....@gmail.com> > wrote: > > > > ​Hello, > > In *SimilarityBase.java*, I can see that the length of the document is is > getting normalized by using the function *decodeNormValue()*. But I can't > understand how the normalizations is done. Can you please help? Also, is > there any way to avoid this doc-length normalization, to use the raw > doc-length (as used in LM-JM Zhai et al. SIGIR-2001)? > > Thanks.. > > P.S. I am using Lucene 4.10.4 > -- Dwaipayan Roy.