[ https://issues.apache.org/jira/browse/LUCENE-7442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cao Manh Dat updated LUCENE-7442: --------------------------------- Attachment: LUCENE-7442.patch It seem that TestRandomChains init MinHashFilter with wrong parameters {code} public MinHashFilter(TokenStream input, int hashCount, int bucketCount, int hashSetSize, boolean withRotation) {code} hashCount, bucketCount, hashSetSize must be positive ones. Here are the patch to fix this issue. > MinHashFilter.FixedSizeTreeSet.add() calls TreeSet.last() without first > testing for emptiness, under which condition NoSuchElementException is thrown > ----------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-7442 > URL: https://issues.apache.org/jira/browse/LUCENE-7442 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Reporter: Steve Rowe > Attachments: LUCENE-7442.patch > > > My Jenkins found this reproducing branch_6x seed: > {noformat} > [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains > [junit4] 2> Exception from random analyzer: > [junit4] 2> charfilters= > [junit4] 2> tokenizer= > [junit4] 2> org.apache.lucene.analysis.standard.StandardTokenizer() > [junit4] 2> filters= > [junit4] 2> > org.apache.lucene.analysis.minhash.MinHashFilter(ValidatingTokenFilter@6ae99167 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word, > 5, 5, -3, true) > [junit4] 2> > org.apache.lucene.analysis.bg.BulgarianStemFilter(ValidatingTokenFilter@40844352 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,keyword=false) > [junit4] 2> offsetsAreCorrect=true > [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRandomChains > -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=4733E677EBDC28FC > -Dtests.slow=true -Dtests.locale=ar-OM > -Dtests.timezone=Atlantic/South_Georgia -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > [junit4] ERROR 3.18s J4 | > TestRandomChains.testRandomChainsWithLargeStrings <<< > [junit4] > Throwable #1: java.util.NoSuchElementException > [junit4] > at > __randomizedtesting.SeedInfo.seed([4733E677EBDC28FC:2D685966B292080F]:0) > [junit4] > at java.util.TreeMap.key(TreeMap.java:1323) > [junit4] > at java.util.TreeMap.lastKey(TreeMap.java:297) > [junit4] > at java.util.TreeSet.last(TreeSet.java:401) > [junit4] > at > org.apache.lucene.analysis.minhash.MinHashFilter$FixedSizeTreeSet.add(MinHashFilter.java:325) > [junit4] > at > org.apache.lucene.analysis.minhash.MinHashFilter.incrementToken(MinHashFilter.java:159) > [junit4] > at > org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:67) > [junit4] > at > org.apache.lucene.analysis.bg.BulgarianStemFilter.incrementToken(BulgarianStemFilter.java:48) > [junit4] > at > org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:67) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:405) > [junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:510) > [junit4] > at > org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:959) > [junit4] > at java.lang.Thread.run(Thread.java:745) > [junit4] 2> NOTE: test params are: codec=Asserting(Lucene62): > {dummy=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=252, > maxMBSortInHeap=5.297834377897023, sim=ClassicSimilarity, locale=ar-OM, > timezone=Atlantic/South_Georgia > [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation > 1.8.0_77 (64-bit)/cpus=16,threads=1,free=395080152,total=465567744 > [junit4] 2> NOTE: All tests run in this JVM: > [TestDecimalDigitFilterFactory, TestMultiWordSynonyms, > TestReversePathHierarchyTokenizer, TestDoubleEscape, > TestHunspellStemFilterFactory, TestArabicNormalizationFilter, > TestUAX29URLEmailAnalyzer, TestSwedishLightStemFilterFactory, > TestBulgarianStemmer, TestASCIIFoldingFilter, > TestDelimitedPayloadTokenFilterFactory, TestIndonesianStemmer, TestCircumfix, > EdgeNGramTokenFilterTest, TestPatternTokenizer, > TestScandinavianFoldingFilter, TestIgnore, TestRandomChains] > [junit4] Completed [130/272 (1!)] on J4 in 9.85s, 2 tests, 1 error <<< > FAILURES! > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org