Thanks for the info!

We're more interested in changing the lengthnorm function vs using
additional stats for scoring so option 2 seems like the right way.

It looks like the encode and decode methods deal with bytes right now -
would changing those APIs to deal with longs instead be a good idea? It
looks like the byte returned from encode is always being cast to long and
the byte passed into decode is always a long to begin with. If we make this
change, would it be useful to submit a patch for it?

Thanks,
Nalini


On Thu, Jun 19, 2014 at 10:28 AM, Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> You may not need to change the length-norm at all: If you want to support
> *additional* statistics, add a docvalues field to your index where you can
> store that information in addition to the Lucene-Default statistics. Based
> on a function query you can then use it for scoring. In fact, you can then
> also use a different data type for the statistics value. The norms in
> Lucene are already internally handled as docvalues fields, too.
>
> On the other hand, if you want to modify the lengthNorm and you use a
> non-float value, you have to also modify the encodeNorm/decodeNorm methods
> of the similarity. The default uses a very lossy float->1byte
> transformation.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -----Original Message-----
> > From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> > Sent: Thursday, June 19, 2014 7:14 PM
> > To: java-user@lucene.apache.org
> > Subject: Changing field lengthnorm to store length
> >
> > Hi,
> >
> > We're interested in having access to the number of terms in the fields
> for a
> > document vs the pre-calculated lengthnorm at scoring time - we want
> > experiment with different lengthnorm functions so it seems like storing
> the
> > raw length and then doing the norm calculation at query time would work.
> >
> > Is changing the lengthnorm method on Similarity class to return the raw
> > number of terms the right way to go to for this? We realize this will
> result in
> > taking up more than a byte to store the value but we're OK with this.
> Will this
> > break anything else under the hood?
> >
> > Thanks,
> > Nalini
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to