Ok, makes sense. Thanks for the info!
On Thu, Jun 19, 2014 at 3:05 PM, Robert Muir <rcm...@gmail.com> wrote: > Don't extend that: extend Similarity. > > Some of those implementations actually rely and optimize for the fact > that its a byte and build lookup tables and so on. > > On Thu, Jun 19, 2014 at 6:03 PM, Nalini Kartha <nalinikar...@gmail.com> > wrote: > > Sorry, I meant the encodeNormValue and decodeNormValue methods on the > > TFIDFSimilarity class - > > > > public byte encodeNormValue(float f) > > public float decodeNormValue(byte b) > > > > > > On Thu, Jun 19, 2014 at 12:08 PM, Robert Muir <rcm...@gmail.com> wrote: > > > >> No they do not. The method is: > >> > >> public abstract long computeNorm(FieldInvertState state); > >> > >> > >> > >> On Thu, Jun 19, 2014 at 1:54 PM, Nalini Kartha <nalinikar...@gmail.com> > >> wrote: > >> > Thanks for the info! > >> > > >> > We're more interested in changing the lengthnorm function vs using > >> > additional stats for scoring so option 2 seems like the right way. > >> > > >> > It looks like the encode and decode methods deal with bytes right now > - > >> > would changing those APIs to deal with longs instead be a good idea? > It > >> > looks like the byte returned from encode is always being cast to long > and > >> > the byte passed into decode is always a long to begin with. If we make > >> this > >> > change, would it be useful to submit a patch for it? > >> > > >> > Thanks, > >> > Nalini > >> > > >> > > >> > On Thu, Jun 19, 2014 at 10:28 AM, Uwe Schindler <u...@thetaphi.de> > wrote: > >> > > >> >> Hi, > >> >> > >> >> You may not need to change the length-norm at all: If you want to > >> support > >> >> *additional* statistics, add a docvalues field to your index where > you > >> can > >> >> store that information in addition to the Lucene-Default statistics. > >> Based > >> >> on a function query you can then use it for scoring. In fact, you can > >> then > >> >> also use a different data type for the statistics value. The norms in > >> >> Lucene are already internally handled as docvalues fields, too. > >> >> > >> >> On the other hand, if you want to modify the lengthNorm and you use a > >> >> non-float value, you have to also modify the encodeNorm/decodeNorm > >> methods > >> >> of the similarity. The default uses a very lossy float->1byte > >> >> transformation. > >> >> > >> >> Uwe > >> >> > >> >> ----- > >> >> Uwe Schindler > >> >> H.-H.-Meier-Allee 63, D-28213 Bremen > >> >> http://www.thetaphi.de > >> >> eMail: u...@thetaphi.de > >> >> > >> >> > >> >> > -----Original Message----- > >> >> > From: Nalini Kartha [mailto:nalinikar...@gmail.com] > >> >> > Sent: Thursday, June 19, 2014 7:14 PM > >> >> > To: java-user@lucene.apache.org > >> >> > Subject: Changing field lengthnorm to store length > >> >> > > >> >> > Hi, > >> >> > > >> >> > We're interested in having access to the number of terms in the > fields > >> >> for a > >> >> > document vs the pre-calculated lengthnorm at scoring time - we want > >> >> > experiment with different lengthnorm functions so it seems like > >> storing > >> >> the > >> >> > raw length and then doing the norm calculation at query time would > >> work. > >> >> > > >> >> > Is changing the lengthnorm method on Similarity class to return the > >> raw > >> >> > number of terms the right way to go to for this? We realize this > will > >> >> result in > >> >> > taking up more than a byte to store the value but we're OK with > this. > >> >> Will this > >> >> > break anything else under the hood? > >> >> > > >> >> > Thanks, > >> >> > Nalini > >> >> > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> >> > >> >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >