now sending to lucene.apache.org

---------- Forwarded message ----------
From: Artem Chereisky <[email protected]>
Date: Sun, Jan 17, 2010 at 1:30 PM
Subject: Re: synonyms
To: [email protected]
Cc: [email protected]


Hi Michael,

I refer to a thread between the two of us about a month and a half ago when
you helped me with lengthNorm for synonyms. It required a change to Lucene
core as per below. I implemented the change and it worked great for me.

I'm now in the middle of moving to 2.9 and that particular change got me
stuck. I'm wondering if you had to deal with the same issue and if yes, how
did you manage it?

Here's the issue:
In 2.4 there was this line of code
  Token token = perThread.localToken.Reinit(stringValue, fieldState.offset,
fieldState.offset + valueLength);

In 2.9 it's changed to
                        perThread.singleTokenTokenStream.Reinit(stringValue,
0, valueLength);
and it doesn't return Token

I can't see how I can get hold of Token to implement the same logic further
down

                        if (token.IncludeInFieldLength)
                        {
                            fieldState.length++;
                        }

Any help would be appreciated.

Regards,
Art



On Fri, Dec 18, 2009 at 12:11 PM, Artem Chereisky <[email protected]>wrote:

> Thank you, Michael. You've been helpful as always.
>
> Art
> -a
>
>
> On 18/12/2009, at 6:06, Michael Garski <[email protected]> wrote:
>
>  Artem,
>>
>> Here's a description of the change made to allow for customizing the
>> length norm  when indexing synonyms.  I do not have a patch available
>> for it at this time.  While I made the change for 2.4 the same approach
>> could be taken in 2.9 however there may a better of implementing it
>> using Attributes however I have not yet investigated this approach.
>>
>> I added an bool property to Token named 'IncludeInFieldLength' that
>> defaulted to true, then in a custom analyzer if I did not want a Token
>> to count towards the field length I would set the value to false.
>> Within the DocInverterPerField class I altered the internals of
>> processFields(Fieldable[] fields, int count) to only increment the value
>> of fieldState.length if the 'IncludeInFieldLength' property on the token
>> is set to true.
>>
>> I made the change to handle the same use case you have - synonym
>> injection - and it worked great.
>>
>> Michael
>>
>> -----Original Message-----
>> From: Artem Chereisky [mailto:[email protected]]
>> Sent: Wednesday, December 16, 2009 5:30 PM
>> To: [email protected]
>> Subject: synonyms
>>
>> Hi Everyone,
>>
>> I implemented synonyms using SynonymFilter and SynonymTree classes which
>> I
>> ported from Java. The solution supports multi-word synonyms and it seems
>> to
>> work fine.
>>
>> One problem with this approach is, although synonyms are at the same
>> position in the index, each gets counted towards the total number of
>> terms.
>> That adversely affects lengthNorm. Michael Garski mentioned earlier that
>> he
>> came across a similar issue and solved it. Am I correct Michael. If so,
>> could you share your approach, please?
>>
>> Synonyms is a fairly standard feature. Is there a 'best practice'
>> solution?
>>
>> Regards,
>> Art
>>
>>

Reply via email to