Hmm, I suspect this is a bug in the position length implementation of
CommonGramsFilter.

This filter inserts additional tokens (bigrams) around stopwords, so
if you have "this is a test" it will create "this this_is is is_a a
a_test" and so on, so it can be viewed as a "conditional"
shinglefilter.

But it hardcodes the length as posLenAttribute.setPositionLength(2); // bigram

If the input is already a graph (posLen != 1), then this will be
incorrect. How does ShingleFilter handle this situation? Would be nice
if we can fix this without capturing state or slowing it down

On Sat, Jun 23, 2012 at 7:47 PM, Apache Jenkins Server
<jenk...@builds.apache.org> wrote:
> Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x/157/
>
> 1 tests failed.
> REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains
>
> Error Message:
> last stage: inconsistent endOffset at pos=41: 7 vs 19; token=i_i i i i i u i 
> i u f i i u f d i i u f d s i i u f d s s i i u f d s s j i i u f d s s j g i 
> i u f d s s j g n i i u f d s s j g n 1 i i u i u f i u f d i u f d s i u f d 
> s s i u f d s s j i u f d s s j g i u f d s s j g n i u f d s s j g n 1 u u f 
> u f d u f d s u f d s s u f d s s j u f d s s j g u f d s s j g n u f d s s j 
> g n 1 f f d f d s f d s s f d s s j f d s s j g f d s s j g n f d s s j g n 1
>
> Stack Trace:
> java.lang.IllegalStateException: last stage: inconsistent endOffset at 
> pos=41: 7 vs 19; token=i_i i i i i u i i u f i i u f d i i u f d s i i u f d 
> s s i i u f d s s j i i u f d s s j g i i u f d s s j g n i i u f d s s j g n 
> 1 i i u i u f i u f d i u f d s i u f d s s i u f d s s j i u f d s s j g i u 
> f d s s j g n i u f d s s j g n 1 u u f u f d u f d s u f d s s u f d s s j u 
> f d s s j g u f d s s j g n u f d s s j g n 1 f f d f d s f d s s f d s s j f 
> d s s j g f d s s j g n f d s s j g n 1
>        at 
> __randomizedtesting.SeedInfo.seed([12635ABB4F789F2A:2F8273DA086A82EA]:0)
>        at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:644)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:554)
>        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:450)
>        at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:860)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>        at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>        at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>        at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>        at 
> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>        at 
> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
>        at 
> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
>        at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
>        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>        at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
>        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
>
>
>
>
> Build Log:
> [...truncated 4723 lines...]
>   [junit4]
>   [junit4] Suite: 
> org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapperTest
>   [junit4] Completed on J0 in 0.30s, 9 tests
>   [junit4]
>   [junit4] Suite: org.apache.lucene.analysis.hi.TestHindiNormalizer
>   [junit4] Completed on J1 in 0.02s, 3 tests
>   [junit4]
>   [junit4] Suite: org.apache.lucene.analysis.snowball.TestSnowballVocab
>   [junit4] Completed on J1 in 3.48s, 1 test
>   [junit4]
>   [junit4] Suite: org.apache.lucene.analysis.hy.TestArmenianAnalyzer
>   [junit4] Completed on J0 in 3.05s, 4 tests
>   [junit4]
>   [junit4] Suite: 
> org.apache.lucene.analysis.miscellaneous.TestRemoveDuplicatesTokenFilter
>   [junit4] Completed on J0 in 7.19s, 5 tests
>   [junit4]
>   [junit4] Suite: org.apache.lucene.analysis.lv.TestLatvianAnalyzer
>   [junit4] Completed on J0 in 1.13s, 4 tests
>   [junit4]
>   [junit4] Suite: org.apache.lucene.analysis.util.TestCharArraySet
>   [junit4] Completed on J0 in 0.09s, 17 tests
>   [junit4]
>   [junit4] Suite: org.apache.lucene.analysis.de.TestGermanNormalizationFilter
>   [junit4] Completed on J0 in 1.75s, 5 tests
>   [junit4]
>   [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>   [junit4] ERROR   4.63s J0 | TestRandomChains.testRandomChains
>   [junit4]    > Throwable #1: java.lang.IllegalStateException: last stage: 
> inconsistent endOffset at pos=41: 7 vs 19; token=i_i i i i i u i i u f i i u 
> f d i i u f d s i i u f d s s i i u f d s s j i i u f d s s j g i i u f d s s 
> j g n i i u f d s s j g n 1 i i u i u f i u f d i u f d s i u f d s s i u f d 
> s s j i u f d s s j g i u f d s s j g n i u f d s s j g n 1 u u f u f d u f d 
> s u f d s s u f d s s j u f d s s j g u f d s s j g n u f d s s j g n 1 f f d 
> f d s f d s s f d s s j f d s s j g f d s s j g n f d s s j g n 1
>   [junit4]    >        at 
> __randomizedtesting.SeedInfo.seed([12635ABB4F789F2A:2F8273DA086A82EA]:0)
>   [junit4]    >        at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135)
>   [junit4]    >        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:644)
>   [junit4]    >        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:554)
>   [junit4]    >        at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:450)
>   [junit4]    >        at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:860)
>   [junit4]    >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>   [junit4]    >        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   [junit4]    >        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   [junit4]    >        at java.lang.reflect.Method.invoke(Method.java:616)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
>   [junit4]    >        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745)
>   [junit4]    >        at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>   [junit4]    >        at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
>   [junit4]    >        at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
>   [junit4]    >
>   [junit4]   2> TEST FAIL: useCharFilter=false text='Qa gciiu fdsasjgn 1'
>   [junit4]   2> Exception from random analyzer:
>   [junit4]   2> charfilters=
>   [junit4]   2>   
> org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, , 
> org.apache.lucene.analysis.CharReader@2b03948d)
>   [junit4]   2>   
> org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, cnoc, 
> org.apache.lucene.analysis.pattern.PatternReplaceCharFilter@780dcb0a)
>   [junit4]   2> tokenizer=
>   [junit4]   2>   
> org.apache.lucene.analysis.MockTokenizer(org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory@2c8bb731,
>  
> org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@518d3b19,
>  initial state: 1
>   [junit4]   2> state 0 [accept]:
>   [junit4]   2>  \\U00000000-\\U00000008 -> 0
>   [junit4]   2>  \\U0000000b-\\U0000000c -> 0
>   [junit4]   2>  \\U0000000e-\\U0000001f -> 0
>   [junit4]   2>  !-\\U0010ffff -> 0
>   [junit4]   2> state 1 [reject]:
>   [junit4]   2>  \\U00000000-\\U00000008 -> 0
>   [junit4]   2>  \\U0000000b-\\U0000000c -> 0
>   [junit4]   2>  \\U0000000e-\\U0000001f -> 0
>   [junit4]   2>  !-\\U0010ffff -> 0
>   [junit4]   2> , false, -38)
>   [junit4]   2> filters=
>   [junit4]   2>   
> org.apache.lucene.analysis.shingle.ShingleFilter(org.apache.lucene.analysis.ValidatingTokenFilter@37caea,
>  17)
>   [junit4]   2>   
> org.apache.lucene.analysis.shingle.ShingleFilter(org.apache.lucene.analysis.ValidatingTokenFilter@37caea,
>  38, 67)
>   [junit4]   2>   
> org.apache.lucene.analysis.commongrams.CommonGramsFilter(LUCENE_40, 
> org.apache.lucene.analysis.ValidatingTokenFilter@37caea, [idm, agy, chh, 
> bljfkyl, i, sqh, y, suth])
>   [junit4]   2> offsetsAreCorrect=true
>   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChains -Dtests.seed=12635ABB4F789F2A 
> -Dtests.multiplier=3 -Dtests.locale=pt 
> -Dtests.timezone=America/Argentina/Salta -Dargs="-Dfile.encoding=ISO8859-1"
>   [junit4]   2>
>   [junit4]    > (@AfterClass output)
>   [junit4]   2> NOTE: test params are: codec=Appending, 
> sim=RandomSimilarityProvider(queryNorm=true,coord=false): {dummy=IB 
> LL-L3(800.0)}, locale=pt, timezone=America/Argentina/Salta
>   [junit4]   2> NOTE: FreeBSD 9.0-RELEASE amd64/Sun Microsystems Inc. 
> 1.6.0_32 (64-bit)/cpus=16,threads=1,free=208676832,total=241041408
>   [junit4]   2> NOTE: All tests run in this JVM: [TestWordDelimiterFilter, 
> PositionFilterTest, TestStopFilter, TestCollationKeyFilter, 
> ShingleAnalyzerWrapperTest, TestArmenianAnalyzer, 
> TestRemoveDuplicatesTokenFilter, TestLatvianAnalyzer, TestCharArraySet, 
> TestGermanNormalizationFilter, TestRandomChains]
>   [junit4]   2>
>   [junit4] Completed on J0 in 5.00s, 1 test, 1 error <<< FAILURES!
> [...truncated 407 lines...]
>
> [...truncated 5231 lines...]
>
> [...truncated 5231 lines...]
>
> [...truncated 5231 lines...]
>
> [...truncated 5231 lines...]
>
> [...truncated 5231 lines...]
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org



-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to