[ 
https://issues.apache.org/jira/browse/LUCENE-7442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat updated LUCENE-7442:
---------------------------------
    Attachment: LUCENE-7442.patch

It seem that TestRandomChains init MinHashFilter with wrong parameters
{code}
public MinHashFilter(TokenStream input, int hashCount, int bucketCount, int 
hashSetSize, boolean withRotation)
{code}
hashCount, bucketCount, hashSetSize must be positive ones.

Here are the patch to fix this issue.

> MinHashFilter.FixedSizeTreeSet.add() calls TreeSet.last() without first 
> testing for emptiness, under which condition NoSuchElementException is thrown
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7442
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7442
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Steve Rowe
>         Attachments: LUCENE-7442.patch
>
>
> My Jenkins found this reproducing branch_6x seed:
> {noformat}
>    [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
>    [junit4]   2> Exception from random analyzer: 
>    [junit4]   2> charfilters=
>    [junit4]   2> tokenizer=
>    [junit4]   2>   org.apache.lucene.analysis.standard.StandardTokenizer()
>    [junit4]   2> filters=
>    [junit4]   2>   
> org.apache.lucene.analysis.minhash.MinHashFilter(ValidatingTokenFilter@6ae99167
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,
>  5, 5, -3, true)
>    [junit4]   2>   
> org.apache.lucene.analysis.bg.BulgarianStemFilter(ValidatingTokenFilter@40844352
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,keyword=false)
>    [junit4]   2> offsetsAreCorrect=true
>    [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRandomChains 
> -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=4733E677EBDC28FC 
> -Dtests.slow=true -Dtests.locale=ar-OM 
> -Dtests.timezone=Atlantic/South_Georgia -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   3.18s J4 | 
> TestRandomChains.testRandomChainsWithLargeStrings <<<
>    [junit4]    > Throwable #1: java.util.NoSuchElementException
>    [junit4]    >      at 
> __randomizedtesting.SeedInfo.seed([4733E677EBDC28FC:2D685966B292080F]:0)
>    [junit4]    >      at java.util.TreeMap.key(TreeMap.java:1323)
>    [junit4]    >      at java.util.TreeMap.lastKey(TreeMap.java:297)
>    [junit4]    >      at java.util.TreeSet.last(TreeSet.java:401)
>    [junit4]    >      at 
> org.apache.lucene.analysis.minhash.MinHashFilter$FixedSizeTreeSet.add(MinHashFilter.java:325)
>    [junit4]    >      at 
> org.apache.lucene.analysis.minhash.MinHashFilter.incrementToken(MinHashFilter.java:159)
>    [junit4]    >      at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:67)
>    [junit4]    >      at 
> org.apache.lucene.analysis.bg.BulgarianStemFilter.incrementToken(BulgarianStemFilter.java:48)
>    [junit4]    >      at 
> org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:67)
>    [junit4]    >      at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkResetException(BaseTokenStreamTestCase.java:405)
>    [junit4]    >      at 
> org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:510)
>    [junit4]    >      at 
> org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:959)
>    [junit4]    >      at java.lang.Thread.run(Thread.java:745)
>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene62): 
> {dummy=Lucene50(blocksize=128)}, docValues:{}, maxPointsInLeafNode=252, 
> maxMBSortInHeap=5.297834377897023, sim=ClassicSimilarity, locale=ar-OM, 
> timezone=Atlantic/South_Georgia
>    [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 
> 1.8.0_77 (64-bit)/cpus=16,threads=1,free=395080152,total=465567744
>    [junit4]   2> NOTE: All tests run in this JVM: 
> [TestDecimalDigitFilterFactory, TestMultiWordSynonyms, 
> TestReversePathHierarchyTokenizer, TestDoubleEscape, 
> TestHunspellStemFilterFactory, TestArabicNormalizationFilter, 
> TestUAX29URLEmailAnalyzer, TestSwedishLightStemFilterFactory, 
> TestBulgarianStemmer, TestASCIIFoldingFilter, 
> TestDelimitedPayloadTokenFilterFactory, TestIndonesianStemmer, TestCircumfix, 
> EdgeNGramTokenFilterTest, TestPatternTokenizer, 
> TestScandinavianFoldingFilter, TestIgnore, TestRandomChains]
>    [junit4] Completed [130/272 (1!)] on J4 in 9.85s, 2 tests, 1 error <<< 
> FAILURES!
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to