Artem,

Here's a description of the change made to allow for customizing the
length norm  when indexing synonyms.  I do not have a patch available
for it at this time.  While I made the change for 2.4 the same approach
could be taken in 2.9 however there may a better of implementing it
using Attributes however I have not yet investigated this approach.

I added an bool property to Token named 'IncludeInFieldLength' that
defaulted to true, then in a custom analyzer if I did not want a Token
to count towards the field length I would set the value to false.
Within the DocInverterPerField class I altered the internals of
processFields(Fieldable[] fields, int count) to only increment the value
of fieldState.length if the 'IncludeInFieldLength' property on the token
is set to true.

I made the change to handle the same use case you have - synonym
injection - and it worked great.

Michael

-----Original Message-----
From: Artem Chereisky [mailto:a.cherei...@gmail.com] 
Sent: Wednesday, December 16, 2009 5:30 PM
To: lucene-net-user@incubator.apache.org
Subject: synonyms

Hi Everyone,

I implemented synonyms using SynonymFilter and SynonymTree classes which
I
ported from Java. The solution supports multi-word synonyms and it seems
to
work fine.

One problem with this approach is, although synonyms are at the same
position in the index, each gets counted towards the total number of
terms.
That adversely affects lengthNorm. Michael Garski mentioned earlier that
he
came across a similar issue and solved it. Am I correct Michael. If so,
could you share your approach, please?

Synonyms is a fairly standard feature. Is there a 'best practice'
solution?

Regards,
Art

Reply via email to