Artem,

I made the changes for 2.9 and can create a patch that can be applied against the trunk. I'll create a JIRA issue and post the patch along with some sample code on how to use it when I am back in the office next on Tuesday.

I don't see this patch being committed to the trunk as it does alter the internal behavior slightly, but sitting in the contrib section.

Michael


On Jan 16, 2010, at 6:34 PM, "Artem Chereisky" <[email protected]> wrote:

now sending to lucene.apache.org

---------- Forwarded message ----------
From: Artem Chereisky <[email protected]>
Date: Sun, Jan 17, 2010 at 1:30 PM
Subject: Re: synonyms
To: [email protected]
Cc: [email protected]


Hi Michael,

I refer to a thread between the two of us about a month and a half ago when you helped me with lengthNorm for synonyms. It required a change to Lucene core as per below. I implemented the change and it worked great for me.

I'm now in the middle of moving to 2.9 and that particular change got me stuck. I'm wondering if you had to deal with the same issue and if yes, how
did you manage it?

Here's the issue:
In 2.4 there was this line of code
Token token = perThread.localToken.Reinit(stringValue, fieldState.offset,
fieldState.offset + valueLength);

In 2.9 it's changed to
perThread.singleTokenTokenStream.Reinit (stringValue,
0, valueLength);
and it doesn't return Token

I can't see how I can get hold of Token to implement the same logic further
down

                       if (token.IncludeInFieldLength)
                       {
                           fieldState.length++;
                       }

Any help would be appreciated.

Regards,
Art



On Fri, Dec 18, 2009 at 12:11 PM, Artem Chereisky <[email protected] >wrote:

Thank you, Michael. You've been helpful as always.

Art
-a


On 18/12/2009, at 6:06, Michael Garski <[email protected]> wrote:

Artem,

Here's a description of the change made to allow for customizing the
length norm when indexing synonyms. I do not have a patch available for it at this time. While I made the change for 2.4 the same approach
could be taken in 2.9 however there may a better of implementing it
using Attributes however I have not yet investigated this approach.

I added an bool property to Token named 'IncludeInFieldLength' that
defaulted to true, then in a custom analyzer if I did not want a Token
to count towards the field length I would set the value to false.
Within the DocInverterPerField class I altered the internals of
processFields(Fieldable[] fields, int count) to only increment the value of fieldState.length if the 'IncludeInFieldLength' property on the token
is set to true.

I made the change to handle the same use case you have - synonym
injection - and it worked great.

Michael

-----Original Message-----
From: Artem Chereisky [mailto:[email protected]]
Sent: Wednesday, December 16, 2009 5:30 PM
To: [email protected]
Subject: synonyms

Hi Everyone,

I implemented synonyms using SynonymFilter and SynonymTree classes which
I
ported from Java. The solution supports multi-word synonyms and it seems
to
work fine.

One problem with this approach is, although synonyms are at the same
position in the index, each gets counted towards the total number of
terms.
That adversely affects lengthNorm. Michael Garski mentioned earlier that
he
came across a similar issue and solved it. Am I correct Michael. If so,
could you share your approach, please?

Synonyms is a fairly standard feature. Is there a 'best practice'
solution?

Regards,
Art



Reply via email to