[
https://issues.apache.org/jira/browse/LUCENENET-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828841#action_12828841
]
Artem Chereisky commented on LUCENENET-337:
-------------------------------------------
I found a java version of a multi-word synonym filter,
http://www.java2s.com/Open-Source/Java-Document/Search-Engine/apache-solr-1.2.0/org/apache/solr/analysis/SynonymFilter.java.htm,
and coded it in c#. I thought it was a de facto standard. Now I'm beginning to
realize there is no standard.
The issue is that it uses look ahead method to determine the longest possible
match. I guess my issue is I can't figure out how to do look ahead using
IncrementToken().
> TokenAttribute for Selectively Including Tokens in Length Norm
> --------------------------------------------------------------
>
> Key: LUCENENET-337
> URL: https://issues.apache.org/jira/browse/LUCENENET-337
> Project: Lucene.Net
> Issue Type: Improvement
> Reporter: Michael Garski
> Priority: Minor
> Attachments: LengthNorm.patch
>
>
> This patch adds functionality to Lucene.Net that allow a TokenFilter to mark
> a Token as not to be included in the length norm calculation through the use
> of a new TokenAttribute interface LengthNormAttribute and a corresponding
> implementation LengthNormAttributeImpl. This functionality is useful to
> prevent the increase of the length norm during synonym injection,
> particularly in cases where there are a large number of synonyms in relation
> to the number of original tokens.
> Following is an example of how to use the new attribute.
> Within your custom TokenFilter, define a field to persist a reference to the
> attribute and set it's value in the constructor. When a the stream advances
> to a new Token within the call to IncrementToken() the value of the
> IncludeInLengthNorm property of the attribute is set to false for Tokens
> which should not be included in the length norm calculation. It defaults to
> true and is reset to true after each Token is consumed within
> DocInverterPerField.ProcessFields.
> {code:title=CustomTokenFilter.cs|borderStyle=solid}
> public class CustomTokenFilter : TokenFilter
> {
> private LengthNormAttribute lnAttribute;
>
> public CustomTokenFilter(TokenStream input) : base(input)
> {
> this.lnAttribute =
> (LengthNormAttribute)AddAttribute(typeof(LengthNormAttribute));
> }
>
> public override bool IncrementToken()
> {
> if (input.IncrementToken())
> {
> // make determination that the token is not to be
> // included in the length norm value
> // this example marks all tokens to not be
> // included in the length norm value
> this.lnAttribute.IncludeInLengthNorm = false;
> return true;
> }
> else
> {
> return false;
> }
> }
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.