Hi, I have tryed to get all the tokens from a TokenStream in the same way as I was doing in the 3.x version of Lucene, but now (at least with WhitespaceTokenizer) I get an exception: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1 at java.lang.Character.codePointAtImpl(Character.java:2405) at java.lang.Character.codePointAt(Character.java:2369) at org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.codePointAt(CharacterUtils.java:164) at org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:166)
The code is quite simple, and I thought that it could have worked, but obviously it doesn't (unless I have made some mistakes). Here is the code, in case you spot some bugs on it (although it is trivial): String str = "this is a test"; Reader reader = new StringReader(str); TokenStream tokenStream = new WhitespaceTokenizer(Version.LUCENE_42, reader); //tokenStreamAnalyzer.tokenStream("test", reader); CharTermAttribute attribute = tokenStream.getAttribute(CharTermAttribute.class); while (tokenStream.incrementToken()) { System.out.println(new String(attribute.buffer(), 0, attribute.length())); } Hope you have any idea of why it is happening. Regards, Andi