Hmm, I suspect this is a bug in the position length implementation of CommonGramsFilter.
This filter inserts additional tokens (bigrams) around stopwords, so if you have "this is a test" it will create "this this_is is is_a a a_test" and so on, so it can be viewed as a "conditional" shinglefilter. But it hardcodes the length as posLenAttribute.setPositionLength(2); // bigram If the input is already a graph (posLen != 1), then this will be incorrect. How does ShingleFilter handle this situation? Would be nice if we can fix this without capturing state or slowing it down On Sat, Jun 23, 2012 at 7:47 PM, Apache Jenkins Server <jenk...@builds.apache.org> wrote: > Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/157/ > > 1 tests failed. > REGRESSION: org.apache.lucene.analysis.core.TestRandomChains.testRandomChains > > Error Message: > last stage: inconsistent endOffset at pos=41: 7 vs 19; token=i_i i i i i u i > i u f i i u f d i i u f d s i i u f d s s i i u f d s s j i i u f d s s j g i > i u f d s s j g n i i u f d s s j g n 1 i i u i u f i u f d i u f d s i u f d > s s i u f d s s j i u f d s s j g i u f d s s j g n i u f d s s j g n 1 u u f > u f d u f d s u f d s s u f d s s j u f d s s j g u f d s s j g n u f d s s j > g n 1 f f d f d s f d s s f d s s j f d s s j g f d s s j g n f d s s j g n 1 > > Stack Trace: > java.lang.IllegalStateException: last stage: inconsistent endOffset at > pos=41: 7 vs 19; token=i_i i i i i u i i u f i i u f d i i u f d s i i u f d > s s i i u f d s s j i i u f d s s j g i i u f d s s j g n i i u f d s s j g n > 1 i i u i u f i u f d i u f d s i u f d s s i u f d s s j i u f d s s j g i u > f d s s j g n i u f d s s j g n 1 u u f u f d u f d s u f d s s u f d s s j u > f d s s j g u f d s s j g n u f d s s j g n 1 f f d f d s f d s s f d s s j f > d s s j g f d s s j g n f d s s j g n 1 > at > __randomizedtesting.SeedInfo.seed([12635ABB4F789F2A:2F8273DA086A82EA]:0) > at > org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:644) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:554) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:450) > at > org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:860) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > at > org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) > at > org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) > > > > > Build Log: > [...truncated 4723 lines...] > [junit4] > [junit4] Suite: > org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapperTest > [junit4] Completed on J0 in 0.30s, 9 tests > [junit4] > [junit4] Suite: org.apache.lucene.analysis.hi.TestHindiNormalizer > [junit4] Completed on J1 in 0.02s, 3 tests > [junit4] > [junit4] Suite: org.apache.lucene.analysis.snowball.TestSnowballVocab > [junit4] Completed on J1 in 3.48s, 1 test > [junit4] > [junit4] Suite: org.apache.lucene.analysis.hy.TestArmenianAnalyzer > [junit4] Completed on J0 in 3.05s, 4 tests > [junit4] > [junit4] Suite: > org.apache.lucene.analysis.miscellaneous.TestRemoveDuplicatesTokenFilter > [junit4] Completed on J0 in 7.19s, 5 tests > [junit4] > [junit4] Suite: org.apache.lucene.analysis.lv.TestLatvianAnalyzer > [junit4] Completed on J0 in 1.13s, 4 tests > [junit4] > [junit4] Suite: org.apache.lucene.analysis.util.TestCharArraySet > [junit4] Completed on J0 in 0.09s, 17 tests > [junit4] > [junit4] Suite: org.apache.lucene.analysis.de.TestGermanNormalizationFilter > [junit4] Completed on J0 in 1.75s, 5 tests > [junit4] > [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains > [junit4] ERROR 4.63s J0 | TestRandomChains.testRandomChains > [junit4] > Throwable #1: java.lang.IllegalStateException: last stage: > inconsistent endOffset at pos=41: 7 vs 19; token=i_i i i i i u i i u f i i u > f d i i u f d s i i u f d s s i i u f d s s j i i u f d s s j g i i u f d s s > j g n i i u f d s s j g n 1 i i u i u f i u f d i u f d s i u f d s s i u f d > s s j i u f d s s j g i u f d s s j g n i u f d s s j g n 1 u u f u f d u f d > s u f d s s u f d s s j u f d s s j g u f d s s j g n u f d s s j g n 1 f f d > f d s f d s s f d s s j f d s s j g f d s s j g n f d s s j g n 1 > [junit4] > at > __randomizedtesting.SeedInfo.seed([12635ABB4F789F2A:2F8273DA086A82EA]:0) > [junit4] > at > org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:644) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:554) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:450) > [junit4] > at > org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:860) > [junit4] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [junit4] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > [junit4] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit4] > at java.lang.reflect.Method.invoke(Method.java:616) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) > [junit4] > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > [junit4] > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) > [junit4] > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > [junit4] > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > [junit4] > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > [junit4] > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > [junit4] > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) > [junit4] > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > [junit4] > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > [junit4] > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > [junit4] > at > org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) > [junit4] > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > [junit4] > at > org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) > [junit4] > at > org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) > [junit4] > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) > [junit4] > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > [junit4] > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) > [junit4] > at > com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) > [junit4] > > [junit4] 2> TEST FAIL: useCharFilter=false text='Qa gciiu fdsasjgn 1' > [junit4] 2> Exception from random analyzer: > [junit4] 2> charfilters= > [junit4] 2> > org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, , > org.apache.lucene.analysis.CharReader@2b03948d) > [junit4] 2> > org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, cnoc, > org.apache.lucene.analysis.pattern.PatternReplaceCharFilter@780dcb0a) > [junit4] 2> tokenizer= > [junit4] 2> > org.apache.lucene.analysis.MockTokenizer(org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory@2c8bb731, > > org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@518d3b19, > initial state: 1 > [junit4] 2> state 0 [accept]: > [junit4] 2> \\U00000000-\\U00000008 -> 0 > [junit4] 2> \\U0000000b-\\U0000000c -> 0 > [junit4] 2> \\U0000000e-\\U0000001f -> 0 > [junit4] 2> !-\\U0010ffff -> 0 > [junit4] 2> state 1 [reject]: > [junit4] 2> \\U00000000-\\U00000008 -> 0 > [junit4] 2> \\U0000000b-\\U0000000c -> 0 > [junit4] 2> \\U0000000e-\\U0000001f -> 0 > [junit4] 2> !-\\U0010ffff -> 0 > [junit4] 2> , false, -38) > [junit4] 2> filters= > [junit4] 2> > org.apache.lucene.analysis.shingle.ShingleFilter(org.apache.lucene.analysis.ValidatingTokenFilter@37caea, > 17) > [junit4] 2> > org.apache.lucene.analysis.shingle.ShingleFilter(org.apache.lucene.analysis.ValidatingTokenFilter@37caea, > 38, 67) > [junit4] 2> > org.apache.lucene.analysis.commongrams.CommonGramsFilter(LUCENE_40, > org.apache.lucene.analysis.ValidatingTokenFilter@37caea, [idm, agy, chh, > bljfkyl, i, sqh, y, suth]) > [junit4] 2> offsetsAreCorrect=true > [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRandomChains > -Dtests.method=testRandomChains -Dtests.seed=12635ABB4F789F2A > -Dtests.multiplier=3 -Dtests.locale=pt > -Dtests.timezone=America/Argentina/Salta -Dargs="-Dfile.encoding=ISO8859-1" > [junit4] 2> > [junit4] > (@AfterClass output) > [junit4] 2> NOTE: test params are: codec=Appending, > sim=RandomSimilarityProvider(queryNorm=true,coord=false): {dummy=IB > LL-L3(800.0)}, locale=pt, timezone=America/Argentina/Salta > [junit4] 2> NOTE: FreeBSD 9.0-RELEASE amd64/Sun Microsystems Inc. > 1.6.0_32 (64-bit)/cpus=16,threads=1,free=208676832,total=241041408 > [junit4] 2> NOTE: All tests run in this JVM: [TestWordDelimiterFilter, > PositionFilterTest, TestStopFilter, TestCollationKeyFilter, > ShingleAnalyzerWrapperTest, TestArmenianAnalyzer, > TestRemoveDuplicatesTokenFilter, TestLatvianAnalyzer, TestCharArraySet, > TestGermanNormalizationFilter, TestRandomChains] > [junit4] 2> > [junit4] Completed on J0 in 5.00s, 1 test, 1 error <<< FAILURES! > [...truncated 407 lines...] > > [...truncated 5231 lines...] > > [...truncated 5231 lines...] > > [...truncated 5231 lines...] > > [...truncated 5231 lines...] > > [...truncated 5231 lines...] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org