Trejkaz created LUCENE-6584:
-------------------------------

             Summary: Docs on StandardTokenizer don't mention the behaviour 
change in Version.LUCENE_4_7_0
                 Key: LUCENE-6584
                 URL: https://issues.apache.org/jira/browse/LUCENE-6584
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 4.10.4
            Reporter: Trejkaz
            Priority: Minor


The following test shows that the behaviour of StandardTokenizer differs once 
you start passing Version.LUCENE_4_7_0 or greater:

{code}
import java.io.StringReader;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardTokenizer;
import org.apache.lucene.util.Version;
import org.junit.Test;

import static org.hamcrest.Matchers.is;
import static org.junit.Assert.assertThat;

public class TestStandardTokenizerStandalone
{
    @Test
    public void testLucene4_6_1() throws Exception
    {
        doTest(Version.LUCENE_4_6_1);
    }

    @Test
    public void testLucene4_7_0() throws Exception
    {
        doTest(Version.LUCENE_4_7_0);
    }

    public void doTest(Version version) throws Exception
    {
        try (TokenStream stream = new StandardTokenizer(version, new 
StringReader(makeLongString(2550))))
        {
            stream.reset();

            assertThat(stream.incrementToken(), is(false));
        }
    }

    private String makeLongString(int length)
    {
        StringBuilder builder = new StringBuilder(length);
        for (int i = 0; i < length; i++)
        {
            builder.append('x');
        }
        return builder.toString();
    }
}
{code}

However, the Javadoc only mentions the behaviour changes in versions 3.1 and 
3.4.

The constructor for passing the version is deprecated, presumably under the 
false impression that no changes occurred during Lucene 4. I know the Version 
parameter was killed off entirely in version 5, which presumably means that 
people who tokenised stuff in Lucene 4.6 or earlier have now been trapped and 
have to copy the tokeniser from Lucene 4 to keep their queries working.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to