Artem,
I made the changes for 2.9 and can create a patch that can be applied
against the trunk. I'll create a JIRA issue and post the patch along
with some sample code on how to use it when I am back in the office
next on Tuesday.
I don't see this patch being committed to the trunk as it does alter
the internal behavior slightly, but sitting in the contrib section.
Michael
On Jan 16, 2010, at 6:34 PM, "Artem Chereisky" <[email protected]>
wrote:
now sending to lucene.apache.org
---------- Forwarded message ----------
From: Artem Chereisky <[email protected]>
Date: Sun, Jan 17, 2010 at 1:30 PM
Subject: Re: synonyms
To: [email protected]
Cc: [email protected]
Hi Michael,
I refer to a thread between the two of us about a month and a half
ago when
you helped me with lengthNorm for synonyms. It required a change to
Lucene
core as per below. I implemented the change and it worked great for
me.
I'm now in the middle of moving to 2.9 and that particular change
got me
stuck. I'm wondering if you had to deal with the same issue and if
yes, how
did you manage it?
Here's the issue:
In 2.4 there was this line of code
Token token = perThread.localToken.Reinit(stringValue,
fieldState.offset,
fieldState.offset + valueLength);
In 2.9 it's changed to
perThread.singleTokenTokenStream.Reinit
(stringValue,
0, valueLength);
and it doesn't return Token
I can't see how I can get hold of Token to implement the same logic
further
down
if (token.IncludeInFieldLength)
{
fieldState.length++;
}
Any help would be appreciated.
Regards,
Art
On Fri, Dec 18, 2009 at 12:11 PM, Artem Chereisky <[email protected]
>wrote:
Thank you, Michael. You've been helpful as always.
Art
-a
On 18/12/2009, at 6:06, Michael Garski <[email protected]>
wrote:
Artem,
Here's a description of the change made to allow for customizing the
length norm when indexing synonyms. I do not have a patch
available
for it at this time. While I made the change for 2.4 the same
approach
could be taken in 2.9 however there may a better of implementing it
using Attributes however I have not yet investigated this approach.
I added an bool property to Token named 'IncludeInFieldLength' that
defaulted to true, then in a custom analyzer if I did not want a
Token
to count towards the field length I would set the value to false.
Within the DocInverterPerField class I altered the internals of
processFields(Fieldable[] fields, int count) to only increment the
value
of fieldState.length if the 'IncludeInFieldLength' property on the
token
is set to true.
I made the change to handle the same use case you have - synonym
injection - and it worked great.
Michael
-----Original Message-----
From: Artem Chereisky [mailto:[email protected]]
Sent: Wednesday, December 16, 2009 5:30 PM
To: [email protected]
Subject: synonyms
Hi Everyone,
I implemented synonyms using SynonymFilter and SynonymTree classes
which
I
ported from Java. The solution supports multi-word synonyms and it
seems
to
work fine.
One problem with this approach is, although synonyms are at the same
position in the index, each gets counted towards the total number of
terms.
That adversely affects lengthNorm. Michael Garski mentioned
earlier that
he
came across a similar issue and solved it. Am I correct Michael.
If so,
could you share your approach, please?
Synonyms is a fairly standard feature. Is there a 'best practice'
solution?
Regards,
Art