Re: Changing field lengthnorm to store length

Robert Muir Thu, 19 Jun 2014 12:09:02 -0700

No they do not. The method is:

  public abstract long computeNorm(FieldInvertState state);




On Thu, Jun 19, 2014 at 1:54 PM, Nalini Kartha <[email protected]> wrote:
> Thanks for the info!
>
> We're more interested in changing the lengthnorm function vs using
> additional stats for scoring so option 2 seems like the right way.
>
> It looks like the encode and decode methods deal with bytes right now -
> would changing those APIs to deal with longs instead be a good idea? It
> looks like the byte returned from encode is always being cast to long and
> the byte passed into decode is always a long to begin with. If we make this
> change, would it be useful to submit a patch for it?
>
> Thanks,
> Nalini
>
>
> On Thu, Jun 19, 2014 at 10:28 AM, Uwe Schindler <[email protected]> wrote:
>
>> Hi,
>>
>> You may not need to change the length-norm at all: If you want to support
>> *additional* statistics, add a docvalues field to your index where you can
>> store that information in addition to the Lucene-Default statistics. Based
>> on a function query you can then use it for scoring. In fact, you can then
>> also use a different data type for the statistics value. The norms in
>> Lucene are already internally handled as docvalues fields, too.
>>
>> On the other hand, if you want to modify the lengthNorm and you use a
>> non-float value, you have to also modify the encodeNorm/decodeNorm methods
>> of the similarity. The default uses a very lossy float->1byte
>> transformation.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: [email protected]
>>
>>
>> > -----Original Message-----
>> > From: Nalini Kartha [mailto:[email protected]]
>> > Sent: Thursday, June 19, 2014 7:14 PM
>> > To: [email protected]
>> > Subject: Changing field lengthnorm to store length
>> >
>> > Hi,
>> >
>> > We're interested in having access to the number of terms in the fields
>> for a
>> > document vs the pre-calculated lengthnorm at scoring time - we want
>> > experiment with different lengthnorm functions so it seems like storing
>> the
>> > raw length and then doing the norm calculation at query time would work.
>> >
>> > Is changing the lengthnorm method on Similarity class to return the raw
>> > number of terms the right way to go to for this? We realize this will
>> result in
>> > taking up more than a byte to store the value but we're OK with this.
>> Will this
>> > break anything else under the hood?
>> >
>> > Thanks,
>> > Nalini
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Changing field lengthnorm to store length

Reply via email to