[ 
https://issues.apache.org/jira/browse/LUCENE-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461695#comment-16461695
 ] 

Shawn Heisey commented on LUCENE-7960:
--------------------------------------

My original idea would have been handled by one boolean -- keeping terms 
shorter than minGram.  On more than one occasion, I've fielded questions where 
it turns out the user is trying to search for terms shorter than their minGram 
size.

In discussing it, the notion of *long* terms being removed by the min/max range 
also came up.  It was an idea I had not originally considered, but I have 
encountered someone since where they had ngram on the index side but not the 
query side, and wanted to search for terms longer than their maxGram size.

It could be reduced to one "keep" boolean to keep both short and long terms, 
but I think we're going to have people who want to keep short terms but not 
long terms, and vice versa.


> NGram filters -- add option to keep short terms
> -----------------------------------------------
>
>                 Key: LUCENE-7960
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7960
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Shawn Heisey
>            Priority: Major
>         Attachments: LUCENE-7960.patch, LUCENE-7960.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When ngram or edgengram filters are used, any terms that are shorter than the 
> minGramSize are completely removed from the token stream.
> This is probably 100% what was intended, but I've seen it cause a lot of 
> problems for users.  I am not suggesting that the default behavior be 
> changed.  That would be far too disruptive to the existing user base.
> I do think there should be a new boolean option, with a name like 
> keepShortTerms, that defaults to false, to allow the short terms to be 
> preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to