[
https://issues.apache.org/jira/browse/LUCENE-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467894#comment-16467894
]
Robert Muir commented on LUCENE-7960:
-------------------------------------
Yes, I think we should deprecate. It helps ppl upgrade and shouldn't be too bad
in this case.
If we currently have 1-arg (TokenStream) and 3-arg (TokenStream, int, int), and
we want to end up at 2-arg (TokenStream, int) and 4-arg (TokenStream, int, int,
boolean) then 7.x can temporarily have 4 constructors: the existing two of
which are deprecated and forward to the new ones. Their javadoc can even
explain what the forwarding is doing. master would just have the two new ones
with no cruft.
> NGram filters -- preserve the original token when it is outside the min/max
> size range
> --------------------------------------------------------------------------------------
>
> Key: LUCENE-7960
> URL: https://issues.apache.org/jira/browse/LUCENE-7960
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: Shawn Heisey
> Priority: Major
> Attachments: LUCENE-7960.patch, LUCENE-7960.patch
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> When ngram or edgengram filters are used, any terms that are shorter than the
> minGramSize are completely removed from the token stream.
> This is probably 100% what was intended, but I've seen it cause a lot of
> problems for users. I am not suggesting that the default behavior be
> changed. That would be far too disruptive to the existing user base.
> I do think there should be a new boolean option, with a name like
> keepShortTerms, that defaults to false, to allow the short terms to be
> preserved.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]