[
https://issues.apache.org/jira/browse/LUCENE-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964862#comment-15964862
]
ASF subversion and git services commented on LUCENE-7760:
---------------------------------------------------------
Commit 9ed722f5655639dd572853df5a5a14130323cf0f in lucene-solr's branch
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9ed722f ]
LUCENE-7760: improve setMaxTokenLength javadocs for StandardAnalyzer/Tokenizer
and UAX29URLEmailAnalyzer/Tokenizer
> StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs are lying
> -----------------------------------------------------------------
>
> Key: LUCENE-7760
> URL: https://issues.apache.org/jira/browse/LUCENE-7760
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: master (7.0), 6.6
>
> Attachments: LUCENE-7760.patch, LUCENE-7760.patch
>
>
> The javadocs claim that too-long tokens are discarded, but in fact they are
> simply chopped up. The following test case unexpectedly passes:
> {noformat}
> public void testMaxTokenLengthNonDefault() throws Exception {
> StandardAnalyzer a = new StandardAnalyzer();
> a.setMaxTokenLength(5);
> assertAnalyzesTo(a, "ab cd toolong xy z", new String[]{"ab", "cd",
> "toolo", "ng", "xy", "z"});
> a.close();
> }
> {noformat}
> We should at least fix the javadocs ...
> (I hit this because I was trying to also add {{setMaxTokenLength}} to
> {{EnglishAnalyzer}}).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]