Michael McCandless created LUCENE-7760:
------------------------------------------

             Summary: StandardAnalyzer/Tokenizer.setMaxTokenLength's javadocs 
are lying
                 Key: LUCENE-7760
                 URL: https://issues.apache.org/jira/browse/LUCENE-7760
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: master (7.0), 6.6


The javadocs claim that too-long tokens are discarded, but in fact they are 
simply chopped up.  The following test case unexpectedly passes:

{noformat}
  public void testMaxTokenLengthNonDefault() throws Exception {
    StandardAnalyzer a = new StandardAnalyzer();
    a.setMaxTokenLength(5);
    assertAnalyzesTo(a, "ab cd toolong xy z", new String[]{"ab", "cd", "toolo", 
"ng", "xy", "z"});
    a.close();
  }
{noformat}

We should at least fix the javadocs ...

(I hit this because I was trying to also add {{setMaxTokenLength}} to 
{{EnglishAnalyzer}}).




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to