I see the bug and can reproduce it. The problem is there is some thai
text, and it runs through KeywordRepeatFitler first (adding a synonym
of itself to every token).

Like this "ab" -> ["ab", "ab"]

Then thaiwordfilter comes along and splits both these tokens: ["a0",
"b0", "a1", "b1"]. This makes offsets go backwards.
When shinglefilter jumps it, then b0 and a1 are shingled, the offsets
are senseless because endOffset < startOffset.

I don't think we should hack around this: This thai filter is really a
tokenizer and should not be a tokenfilter. There is an issue for that,
I will take it.

On Wed, Mar 19, 2014 at 11:31 AM, Robert Muir <[email protected]> wrote:
> this fail is just because i increased this test to try harder (it
> takes multiplier into account etc)
>
> thai + shingles looks suspicious. I'll take a look in a bit.
>
> On Wed, Mar 19, 2014 at 11:26 AM, Policeman Jenkins Server
> <[email protected]> wrote:
>> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/9741/
>> Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops -XX:+UseParallelGC 
>> -XX:-UseSuperWord
>>
>> 1 tests failed.
>> REGRESSION:  
>> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains
>>
>> Error Message:
>> startOffset must be non-negative, and endOffset must be >= startOffset, 
>> startOffset=2,endOffset=1
>>
>> Stack Trace:
>> java.lang.IllegalArgumentException: startOffset must be non-negative, and 
>> endOffset must be >= startOffset, startOffset=2,endOffset=1
>>         at 
>> __randomizedtesting.SeedInfo.seed([B5B16FF77330D152:885046963422CC92]:0)
>>         at 
>> org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45)
>>         at 
>> org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345)
>>         at 
>> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78)
>>         at 
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:694)
>>         at 
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:605)
>>         at 
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:506)
>>         at 
>> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:925)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>         at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
>>         at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826)
>>         at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862)
>>         at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
>>         at 
>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>>         at 
>> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
>>         at 
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>>         at 
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>>         at 
>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
>>         at 
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
>>         at 
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>>         at 
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>>         at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
>>         at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
>>         at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
>>         at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
>>         at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
>>         at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
>>         at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
>>         at 
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>>         at 
>> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
>>         at 
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>>         at 
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>>         at 
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>>         at 
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>>         at 
>> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
>>         at 
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>>         at 
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
>>         at 
>> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
>>         at 
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>>         at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
>>         at java.lang.Thread.run(Thread.java:744)
>>
>>
>>
>>
>> Build Log:
>> [...truncated 5405 lines...]
>>    [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>>    [junit4]   2> TEST FAIL: useCharFilter=false text='\u0e46\u0e31\u0e23 
>> begc'
>>    [junit4]   2> Exception from random analyzer:
>>    [junit4]   2> charfilters=
>>    [junit4]   2>   
>> org.apache.lucene.analysis.charfilter.HTMLStripCharFilter(java.io.StringReader@612c1515,
>>  [<IDEOGRAPHIC>, <SOUTHEAST_ASIAN>, <COMPANY>])
>>    [junit4]   2> tokenizer=
>>    [junit4]   2>   
>> org.apache.lucene.analysis.core.WhitespaceTokenizer(LUCENE_48, 
>> org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@58a2dc91)
>>    [junit4]   2> filters=
>>    [junit4]   2>   
>> org.apache.lucene.analysis.payloads.NumericPayloadTokenFilter(ValidatingTokenFilter@5429332c
>>  term=,bytes=[],startOffset=0,endOffset=0,payload=null,type=word, 0.5111438, 
>> wuxmgqvdw)
>>    [junit4]   2>   
>> org.apache.lucene.analysis.miscellaneous.KeywordRepeatFilter(ValidatingTokenFilter@223b2674
>>  
>> term=,bytes=[],startOffset=0,endOffset=0,payload=null,type=word,keyword=false,positionIncrement=1)
>>    [junit4]   2>   org.apache.lucene.analysis.th.ThaiWordFilter(LUCENE_48, 
>> ValidatingTokenFilter@9de2aaa 
>> term=,bytes=[],startOffset=0,endOffset=0,payload=null,type=word,keyword=false,positionIncrement=1)
>>    [junit4]   2>   
>> org.apache.lucene.analysis.shingle.ShingleFilter(ValidatingTokenFilter@6b38f759
>>  
>> term=,bytes=[],startOffset=0,endOffset=0,payload=null,type=word,keyword=false,positionIncrement=1,positionLength=1)
>>    [junit4]   2> offsetsAreCorrect=false
>>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
>> -Dtests.method=testRandomChains -Dtests.seed=B5B16FF77330D152 
>> -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=hr_HR 
>> -Dtests.timezone=GMT0 -Dtests.file.encoding=UTF-8
>>    [junit4] ERROR   10.2s J1 | TestRandomChains.testRandomChains <<<
>>    [junit4]    > Throwable #1: java.lang.IllegalArgumentException: 
>> startOffset must be non-negative, and endOffset must be >= startOffset, 
>> startOffset=2,endOffset=1
>>    [junit4]    >        at 
>> __randomizedtesting.SeedInfo.seed([B5B16FF77330D152:885046963422CC92]:0)
>>    [junit4]    >        at 
>> org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl.setOffset(OffsetAttributeImpl.java:45)
>>    [junit4]    >        at 
>> org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345)
>>    [junit4]    >        at 
>> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78)
>>    [junit4]    >        at 
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:694)
>>    [junit4]    >        at 
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:605)
>>    [junit4]    >        at 
>> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:506)
>>    [junit4]    >        at 
>> org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:925)
>>    [junit4]    >        at java.lang.Thread.run(Thread.java:744)
>>    [junit4]   2> NOTE: test params are: codec=Appending, 
>> sim=RandomSimilarityProvider(queryNorm=true,coord=no): {dummy=DFR I(F)1}, 
>> locale=hr_HR, timezone=GMT0
>>    [junit4]   2> NOTE: Linux 3.8.0-37-generic amd64/Oracle Corporation 
>> 1.7.0_51 (64-bit)/cpus=8,threads=1,free=165889424,total=283639808
>>    [junit4]   2> NOTE: All tests run in this JVM: [TestFinnishAnalyzer, 
>> TestTypeTokenFilter, TestFrenchMinimalStemFilterFactory, 
>> TestCapitalizationFilter, TestGermanStemFilterFactory, 
>> TestKeywordMarkerFilter, TestSoraniNormalizationFilter, 
>> TestPerFieldAnalyzerWrapper, TestGalicianAnalyzer, 
>> TestSwedishLightStemFilterFactory, TestCodepointCountFilterFactory, 
>> TestCJKBigramFilterFactory, TestScandinavianFoldingFilterFactory, 
>> TestHungarianLightStemFilterFactory, TestReverseStringFilter, 
>> TestChineseTokenizer, DelimitedPayloadTokenFilterTest, TestArabicAnalyzer, 
>> TestGalicianStemFilter, CommonGramsFilterTest, TestGermanAnalyzer, 
>> TestCzechStemmer, TestKeepFilterFactory, TestKeywordAnalyzer, 
>> TestKeywordMarkerFilterFactory, TestCommonGramsFilterFactory, 
>> TestBulgarianStemmer, TestCharArrayMap, TestRussianAnalyzer, 
>> TestPorterStemFilterFactory, TestCollationKeyAnalyzer, 
>> TestLucene47WordDelimiterFilter, TestCharArrayIterator, 
>> TestDictionaryCompoundWordTokenFilterFactory, TestGreekStemmer, 
>> TestSynonymMapFilter, TestRollingCharBuffer, TestStandardAnalyzer, 
>> TestCompoundWordTokenFilter, TestSwedishLightStemFilter, 
>> TokenTypeSinkTokenizerTest, TestCJKBigramFilter, TestEnglishAnalyzer, 
>> TestAnalyzers, TestHungarianLightStemFilter, TestHindiFilters, 
>> TestChineseFilterFactory, TestWordlistLoader, 
>> TestScandinavianNormalizationFilterFactory, TestKStemmer, 
>> DateRecognizerSinkTokenizerTest, TestPortugueseMinimalStemFilter, 
>> HTMLStripCharFilterTest, TestAllAnalyzersHaveFactories, 
>> TestPortugueseLightStemFilterFactory, TestScandinavianFoldingFilter, 
>> TestSpanishAnalyzer, TestSnowballPorterFilterFactory, TestCharTokenizers, 
>> TestDependencies, TestBrazilianStemmer, TestClassicAnalyzer, 
>> TestEnglishMinimalStemFilterFactory, TestFilesystemResourceLoader, 
>> TestTrimFilterFactory, TestSpanishLightStemFilter, TestFactories, 
>> TestSnowball, TestRandomChains]
>>    [junit4] Completed on J1 in 38.37s, 2 tests, 1 error <<< FAILURES!
>>
>> [...truncated 412 lines...]
>> BUILD FAILED
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:467: The 
>> following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:447: The 
>> following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:45: The following 
>> error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/extra-targets.xml:37: The 
>> following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build.xml:539: The 
>> following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1996:
>>  The following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/analysis/build.xml:106:
>>  The following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/analysis/build.xml:38:
>>  The following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/module-build.xml:60: 
>> The following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:1276:
>>  The following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/common-build.xml:908:
>>  There were test failures: 261 suites, 1424 tests, 1 error, 1 ignored
>>
>> Total time: 19 minutes 2 seconds
>> Build step 'Invoke Ant' marked build as failure
>> Description set: Java: 64bit/jdk1.7.0_51 -XX:-UseCompressedOops 
>> -XX:+UseParallelGC -XX:-UseSuperWord
>> Archiving artifacts
>> Recording test results
>> Email was triggered for: Failure
>> Sending email for trigger: Failure
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to