[
https://issues.apache.org/jira/browse/LUCENE-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658308#action_12658308
]
Hoss Man commented on LUCENE-1491:
----------------------------------
patch looks good ... the one question i have is whether the fix meets user
expectations: the patch as posted "skips" any input tokens that are shorter
then the minimum ngram length ... is that what most people will expect, or will
people expect shorter tokens to be passed through?
ie: should "min" be the minimum token size produced by the filters (a hard
min), or should it be the minimum ngram size produced by the filter (a soft
min)?
either way this patch is an improvement, i'm just wondering what we want to
define the semantics to be (or if we want to make an additional option for this)
> EdgeNGramTokenFilter stops on tokens smaller then minimum gram size.
> --------------------------------------------------------------------
>
> Key: LUCENE-1491
> URL: https://issues.apache.org/jira/browse/LUCENE-1491
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 2.4, 2.4.1, 2.9, 3.0
> Reporter: Todd Feak
> Attachments: LUCENE-1491.patch
>
>
> If a token is encountered in the stream that is shorter in length than the
> min gram size, the filter will stop processing the token stream.
> Working up a unit test now, but may be a few days before I can provide it.
> Wanted to get it in the system.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]