[ 
https://issues.apache.org/jira/browse/LUCENE-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421475#comment-16421475
 ] 

Ingomar Wesp commented on LUCENE-7960:
--------------------------------------

I'd like to propose a patch (see attached pull request #349) that adds two 
options to the EdgeNGramFilter:
 * keepShortTerms: Causes the filter to pass through input terms that are 
shorter than the minimum gram size.
 * keepLongTerms: Causes the filter to pass through input terms that are longer 
than the maximum gram size.

I'm not entirely sure about the usefulness of keepLongTerms, but enabling the 
ability pass through short terms would certainly be neat for queries where 
you'd like to match ALL tokens as either prefixes or exact terms, but some 
query tokens are shorter than the minimum gram size. As far is I understand, a 
second field containing the exact terms isn't really a viable alternative 
there, because you can easily run into situations where only a subset of query 
tokens matches for either field.

> NGram filters -- add option to keep short terms
> -----------------------------------------------
>
>                 Key: LUCENE-7960
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7960
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Shawn Heisey
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When ngram or edgengram filters are used, any terms that are shorter than the 
> minGramSize are completely removed from the token stream.
> This is probably 100% what was intended, but I've seen it cause a lot of 
> problems for users.  I am not suggesting that the default behavior be 
> changed.  That would be far too disruptive to the existing user base.
> I do think there should be a new boolean option, with a name like 
> keepShortTerms, that defaults to false, to allow the short terms to be 
> preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to