Thanks for the info! We're more interested in changing the lengthnorm function vs using additional stats for scoring so option 2 seems like the right way.
It looks like the encode and decode methods deal with bytes right now - would changing those APIs to deal with longs instead be a good idea? It looks like the byte returned from encode is always being cast to long and the byte passed into decode is always a long to begin with. If we make this change, would it be useful to submit a patch for it? Thanks, Nalini On Thu, Jun 19, 2014 at 10:28 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > You may not need to change the length-norm at all: If you want to support > *additional* statistics, add a docvalues field to your index where you can > store that information in addition to the Lucene-Default statistics. Based > on a function query you can then use it for scoring. In fact, you can then > also use a different data type for the statistics value. The norms in > Lucene are already internally handled as docvalues fields, too. > > On the other hand, if you want to modify the lengthNorm and you use a > non-float value, you have to also modify the encodeNorm/decodeNorm methods > of the similarity. The default uses a very lossy float->1byte > transformation. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Nalini Kartha [mailto:nalinikar...@gmail.com] > > Sent: Thursday, June 19, 2014 7:14 PM > > To: java-user@lucene.apache.org > > Subject: Changing field lengthnorm to store length > > > > Hi, > > > > We're interested in having access to the number of terms in the fields > for a > > document vs the pre-calculated lengthnorm at scoring time - we want > > experiment with different lengthnorm functions so it seems like storing > the > > raw length and then doing the norm calculation at query time would work. > > > > Is changing the lengthnorm method on Similarity class to return the raw > > number of terms the right way to go to for this? We realize this will > result in > > taking up more than a byte to store the value but we're OK with this. > Will this > > break anything else under the hood? > > > > Thanks, > > Nalini > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >