[
https://issues.apache.org/jira/browse/LUCENE-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801450#action_12801450
]
Robert Muir commented on LUCENE-2198:
-------------------------------------
bq. Surely the native clone() invoked for every additional attribute counts for
something?
But it looks to me like only each Attribute*Impl* is cloned, so if you are
worried about this and using a lot of attributes you could use your own
AttributeFactory, similar to Token.TOKEN_ATTRIBUTE_FACTORY to pack everything
however you see fit, right?
I think the interface as boolean is correct, and I think the reference
implementation is correct too.
> support protected words in Stemming TokenFilters
> ------------------------------------------------
>
> Key: LUCENE-2198
> URL: https://issues.apache.org/jira/browse/LUCENE-2198
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Analysis
> Affects Versions: 3.0
> Reporter: Robert Muir
> Priority: Minor
> Attachments: LUCENE-2198.patch, LUCENE-2198.patch
>
>
> This is from LUCENE-1515
> I propose that all stemming TokenFilters have an 'exclusion set' that
> bypasses any stemming for words in this set.
> Some stemming tokenfilters have this, some do not.
> This would be one way for Karl to implement his new swedish stemmer (as a
> text file of ignore words).
> Additionally, it would remove duplication between lucene and solr, as they
> reimplement snowballfilter since it does not have this functionality.
> Finally, I think this is a pretty common use case, where people want to
> ignore things like proper nouns in the stemming.
> As an alternative design I considered a case where we generalized this to
> CharArrayMap (and ignoring words would mean mapping them to themselves),
> which would also provide a mechanism to override the stemming algorithm. But
> I think this is too expert, could be its own filter, and the only example of
> this i can find is in the Dutch stemmer.
> So I think we should just provide ignore with CharArraySet, but if you feel
> otherwise please comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]