Geoffrey Lawson created LUCENE-9963: ---------------------------------------
Summary: Flatten graph filter has errors when there are holes at beginning or end of alternate paths Key: LUCENE-9963 URL: https://issues.apache.org/jira/browse/LUCENE-9963 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 8.8 Reporter: Geoffrey Lawson If asserts are enabled having gaps at the beginning or end of an alternate path can result in assertion errors ex: {code:java} java.lang.AssertionError: 2 at org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195) {code} Or {code:java} java.lang.AssertionError at org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:191) {code} If asserts are not enabled these the same conditions will result in either IndexOutOfBounds Exceptions, or dropped tokens. {code:java} java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 8 at org.apache.lucene.util.RollingBuffer.get(RollingBuffer.java:109) at org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:325) {code} These issues can be recreated with the following unit tests {code:java} public void testAltPathFirstStepHole() throws IOException { TokenStream in = new CannedTokenStream(0, 3, new Token[]{ token("abc",1, 3, 0, 3), token("b",1, 1, 1, 2), token("c",1, 1, 2, 3) }); TokenStream out = new FlattenGraphFilter(in); assertTokenStreamContents(out, new String[]{"abc", "b", "c"}, new int[] {0, 1, 2}, new int[] {3, 2, 3}, new int[] {1, 1, 1}, new int[] {3, 1, 1}, //token 0 may need to be len 1 after flattening 3); }{code} {code:java} public void testAltPathLastStepHole() throws IOException { TokenStream in = new CannedTokenStream(0, 4, new Token[]{ token("abc",1, 3, 0, 3), token("a",0, 1, 0, 1), token("b",1, 1, 1, 2), token("d",2, 1, 3, 4) }); TokenStream out = new FlattenGraphFilter(in); assertTokenStreamContents(out, new String[]{"abc", "a", "b", "d"}, new int[] {0, 0, 1, 3}, new int[] {1, 1, 2, 4}, new int[] {1, 0, 1, 2}, new int[] {3, 1, 1, 1}, 4); }{code} {code:java} public void testAltPathLastStepHoleWithoutEndToken() throws IOException { TokenStream in = new CannedTokenStream(0, 2, new Token[]{ token("abc",1, 3, 0, 3), token("a",0, 1, 0, 1), token("b",1, 1, 1, 2) }); TokenStream out = new FlattenGraphFilter(in); assertTokenStreamContents(out, new String[]{"abc", "a", "b"}, new int[] {0, 0, 1}, new int[] {1, 1, 2}, new int[] {1, 0, 1}, new int[] {1, 1, 1}, 2); }{code} I believe Lucene-8723 is a related issue as it looks like the last token in an alternate path is being deleted. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org