I see the bug and can reproduce it. The problem is there is some thai text, and it runs through KeywordRepeatFitler first (adding a synonym of itself to every token).
Like this "ab" -> ["ab", "ab"] Then thaiwordfilter comes along and splits both these tokens: ["a0", "b0", "a1", "b1"]. This makes offsets go backwards. When shinglefilter jumps it, then b0 and a1 are shingled, the offsets are senseless because endOffset < startOffset. I don't think we should hack around this: This thai filter is really a tokenizer and should not be a tokenfilter. There is an issue for that, I will take it. On Wed, Mar 19, 2014 at 11:31 AM, Robert Muir <[email protected]> wrote: > this fail is just because i increased this test to try harder (it > takes multiplier into account etc) > > thai + shingles looks suspicious. I'll take a look in a bit. > > On Wed, Mar 19, 2014 at 11:26 AM, Policeman Jenkins Server > <[email protected]> wrote: >> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9741/ >> Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops -XX:+UseParallelGC >> -XX:-UseSuperWord >> >> 1 tests failed. >> REGRESSION: >> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains >> >> Error Message: >> startOffset must be non-negative, and endOffset must be >= startOffset, >> startOffset=2,endOffset=1 >> >> Stack Trace: >> java.lang.IllegalArgumentException: startOffset must be non-negative, and >> endOffset must be >= startOffset, startOffset=2,endOffset=1 >> at >> __randomizedtesting.SeedInfo.seed([B5B16FF77330D152:885046963422CC92]:0) >> at >> org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45) >> at >> org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345) >> at >> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78) >> at >> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:694) >> at >> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:605) >> at >> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:506) >> at >> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:925) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) >> at >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) >> at >> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> at >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) >> at >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >> at >> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> at >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) >> at >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> at >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) >> at >> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) >> at >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >> at >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) >> at java.lang.Thread.run(Thread.java:744) >> >> >> >> >> Build Log: >> [...truncated 5405 lines...] >> [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains >> [junit4] 2> TEST FAIL: useCharFilter=false text='\u0e46\u0e31\u0e23 >> begc' >> [junit4] 2> Exception from random analyzer: >> [junit4] 2> charfilters= >> [junit4] 2> >> org.apache.lucene.analysis.charfilter.HTMLStripCharFilter(java.io.StringReader@612c1515, >> [<IDEOGRAPHIC>, <SOUTHEAST_ASIAN>, <COMPANY>]) >> [junit4] 2> tokenizer= >> [junit4] 2> >> org.apache.lucene.analysis.core.WhitespaceTokenizer(LUCENE_48, >> org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@58a2dc91) >> [junit4] 2> filters= >> [junit4] 2> >> org.apache.lucene.analysis.payloads.NumericPayloadTokenFilter(ValidatingTokenFilter@5429332c >> term=,bytes=[],startOffset=0,endOffset=0,payload=null,type=word, 0.5111438, >> wuxmgqvdw) >> [junit4] 2> >> org.apache.lucene.analysis.miscellaneous.KeywordRepeatFilter(ValidatingTokenFilter@223b2674 >> >> term=,bytes=[],startOffset=0,endOffset=0,payload=null,type=word,keyword=false,positionIncrement=1) >> [junit4] 2> org.apache.lucene.analysis.th.ThaiWordFilter(LUCENE_48, >> ValidatingTokenFilter@9de2aaa >> term=,bytes=[],startOffset=0,endOffset=0,payload=null,type=word,keyword=false,positionIncrement=1) >> [junit4] 2> >> org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@6b38f759 >> >> term=,bytes=[],startOffset=0,endOffset=0,payload=null,type=word,keyword=false,positionIncrement=1,positionLength=1) >> [junit4] 2> offsetsAreCorrect=false >> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRandomChains >> -Dtests.method=testRandomChains -Dtests.seed=B5B16FF77330D152 >> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=hr_HR >> -Dtests.timezone=GMT0 -Dtests.file.encoding=UTF-8 >> [junit4] ERROR 10.2s J1 | TestRandomChains.testRandomChains <<< >> [junit4] > Throwable #1: java.lang.IllegalArgumentException: >> startOffset must be non-negative, and endOffset must be >= startOffset, >> startOffset=2,endOffset=1 >> [junit4] > at >> __randomizedtesting.SeedInfo.seed([B5B16FF77330D152:885046963422CC92]:0) >> [junit4] > at >> org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45) >> [junit4] > at >> org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345) >> [junit4] > at >> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78) >> [junit4] > at >> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:694) >> [junit4] > at >> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:605) >> [junit4] > at >> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:506) >> [junit4] > at >> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:925) >> [junit4] > at java.lang.Thread.run(Thread.java:744) >> [junit4] 2> NOTE: test params are: codec=Appending, >> sim=RandomSimilarityProvider(queryNorm=true,coord=no): {dummy=DFR I(F)1}, >> locale=hr_HR, timezone=GMT0 >> [junit4] 2> NOTE: Linux 3.8.0-37-generic amd64/Oracle Corporation >> 1.7.0_51 (64-bit)/cpus=8,threads=1,free=165889424,total=283639808 >> [junit4] 2> NOTE: All tests run in this JVM: [TestFinnishAnalyzer, >> TestTypeTokenFilter, TestFrenchMinimalStemFilterFactory, >> TestCapitalizationFilter, TestGermanStemFilterFactory, >> TestKeywordMarkerFilter, TestSoraniNormalizationFilter, >> TestPerFieldAnalyzerWrapper, TestGalicianAnalyzer, >> TestSwedishLightStemFilterFactory, TestCodepointCountFilterFactory, >> TestCJKBigramFilterFactory, TestScandinavianFoldingFilterFactory, >> TestHungarianLightStemFilterFactory, TestReverseStringFilter, >> TestChineseTokenizer, DelimitedPayloadTokenFilterTest, TestArabicAnalyzer, >> TestGalicianStemFilter, CommonGramsFilterTest, TestGermanAnalyzer, >> TestCzechStemmer, TestKeepFilterFactory, TestKeywordAnalyzer, >> TestKeywordMarkerFilterFactory, TestCommonGramsFilterFactory, >> TestBulgarianStemmer, TestCharArrayMap, TestRussianAnalyzer, >> TestPorterStemFilterFactory, TestCollationKeyAnalyzer, >> TestLucene47WordDelimiterFilter, TestCharArrayIterator, >> TestDictionaryCompoundWordTokenFilterFactory, TestGreekStemmer, >> TestSynonymMapFilter, TestRollingCharBuffer, TestStandardAnalyzer, >> TestCompoundWordTokenFilter, TestSwedishLightStemFilter, >> TokenTypeSinkTokenizerTest, TestCJKBigramFilter, TestEnglishAnalyzer, >> TestAnalyzers, TestHungarianLightStemFilter, TestHindiFilters, >> TestChineseFilterFactory, TestWordlistLoader, >> TestScandinavianNormalizationFilterFactory, TestKStemmer, >> DateRecognizerSinkTokenizerTest, TestPortugueseMinimalStemFilter, >> HTMLStripCharFilterTest, TestAllAnalyzersHaveFactories, >> TestPortugueseLightStemFilterFactory, TestScandinavianFoldingFilter, >> TestSpanishAnalyzer, TestSnowballPorterFilterFactory, TestCharTokenizers, >> TestDependencies, TestBrazilianStemmer, TestClassicAnalyzer, >> TestEnglishMinimalStemFilterFactory, TestFilesystemResourceLoader, >> TestTrimFilterFactory, TestSpanishLightStemFilter, TestFactories, >> TestSnowball, TestRandomChains] >> [junit4] Completed on J1 in 38.37s, 2 tests, 1 error <<< FAILURES! >> >> [...truncated 412 lines...] >> BUILD FAILED >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The >> following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:447: The >> following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:45: The following >> error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:37: The >> following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build.xml:539: The >> following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1996: >> The following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/analysis/build.xml:106: >> The following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/analysis/build.xml:38: >> The following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/module-build.xml:60: >> The following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1276: >> The following error occurred while executing this line: >> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:908: >> There were test failures: 261 suites, 1424 tests, 1 error, 1 ignored >> >> Total time: 19 minutes 2 seconds >> Build step 'Invoke Ant' marked build as failure >> Description set: Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops >> -XX:+UseParallelGC -XX:-UseSuperWord >> Archiving artifacts >> Recording test results >> Email was triggered for: Failure >> Sending email for trigger: Failure >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
