Hi, 
I have tryed to get all the tokens from a TokenStream in the same way as I was 
doing in the 3.x version of Lucene, but now (at least with WhitespaceTokenizer) 
I get an exception: 
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at java.lang.Character.codePointAtImpl(Character.java:2405)
    at java.lang.Character.codePointAt(Character.java:2369)
    at 
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164)
    at 
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)



The code is quite simple, and I thought that it could have worked, but 
obviously it doesn't (unless I have made some mistakes).

Here is the code, in case you spot some bugs on it (although it is trivial):
 String str = "this is a test";
        Reader reader = new StringReader(str);
        TokenStream tokenStream = new WhitespaceTokenizer(Version.LUCENE_42, 
reader);  //tokenStreamAnalyzer.tokenStream("test", reader);
        CharTermAttribute attribute = 
tokenStream.getAttribute(CharTermAttribute.class);
        while (tokenStream.incrementToken()) {
            System.out.println(new String(attribute.buffer(), 0, 
attribute.length()));
        }

Hope you have any idea of why it is happening.
Regards, 
Andi
                                          

Reply via email to