Michael McCandless created LUCENE-7760:
------------------------------------------
Summary: StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs
are lying
Key: LUCENE-7760
URL: https://issues.apache.org/jira/browse/LUCENE-7760
Project: Lucene - Core
Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: master (7.0), 6.6
The javadocs claim that too-long tokens are discarded, but in fact they are
simply chopped up. The following test case unexpectedly passes:
{noformat}
public void testMaxTokenLengthNonDefault() throws Exception {
StandardAnalyzer a = new StandardAnalyzer();
a.setMaxTokenLength(5);
assertAnalyzesTo(a, "ab cd toolong xy z", new String[]{"ab", "cd", "toolo",
"ng", "xy", "z"});
a.close();
}
{noformat}
We should at least fix the javadocs ...
(I hit this because I was trying to also add {{setMaxTokenLength}} to
{{EnglishAnalyzer}}).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]