[ 
https://issues.apache.org/jira/browse/LUCENE-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524103#comment-16524103
 ] 

Erick Erickson commented on LUCENE-8370:
----------------------------------------

[~rcmuir] Thanks for looking, I just got to it.

I agree the test makes some (now) invalid assumptions about TMP.

When specifying the maximum number of segments (other than 1), TMP does a "best 
effort" attempt to hit that target but does not guarantee it. The algorithm is 
roughly.

1> compute the theoretical segment size to hit the target exactly, i.e. 
totalIndexBytes/numSegmentsSpecified

2> Increase <1> by 25% (this is a totally arbitrary percentage on my part).

3> Find the "best" merges respecting the size in <2> and do them.

If the scoring algorithm happens to pick segments to merge that don't pack well 
in the limit from <2> above, and there'll be more segments than specified.

What should be true in this case is that no pair of the segments that result 
from the merge will sum to <  the theoretical max size 
((totalIndexBytes/segsSpecified) * 1.25).

TestTieredMergePolicy does test this expectation.

I can take this assert out of this specific policy (TMP) here in 
RandomIndexWriter or remove it completely, WDYT? Actually take this out of 
RandomIndexWriter for TMP or when it's TMP and the number of segments specified 
is > 1.

[~mikemccand] any opinions? This is the "scary loop" from LUCENE-7976 that made 
us both nervous and I removed.

> Reproducing 
> TestLucene70DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields() 
> failure
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8370
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8370
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index, general/test
>            Reporter: Steve Rowe
>            Assignee: Erick Erickson
>            Priority: Major
>
> Policeman Jenkins found a reproducing seed for a 
> {{TestLucene70DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields()}}
>  failure [https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/22320/]; 
> {{git bisect}} blames commit {{2519025}} on LUCENE-7976:
> {noformat}
> Checking out Revision 8c714348aeea51df19e7603905f85995bcf0371c 
> (refs/remotes/origin/master)
> [...]
>    [junit4] Suite: 
> org.apache.lucene.codecs.lucene70.TestLucene70DocValuesFormat
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestLucene70DocValuesFormat 
> -Dtests.method=testSortedSetVariableLengthBigVsStoredFields 
> -Dtests.seed=63A61B46A6934B1A -Dtests.multiplier=3 -Dtests.slow=true 
> -Dtests.locale=sw-TZ -Dtests.timezone=Pacific/Pitcairn -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>    [junit4] FAILURE 23.3s J2 | 
> TestLucene70DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields <<<
>    [junit4]    > Throwable #1: java.lang.AssertionError: limit=4 actual=5
>    [junit4]    >      at 
> __randomizedtesting.SeedInfo.seed([63A61B46A6934B1A:6BE93FA35E02851]:0)
>    [junit4]    >      at 
> org.apache.lucene.index.RandomIndexWriter.doRandomForceMerge(RandomIndexWriter.java:372)
>    [junit4]    >      at 
> org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:386)
>    [junit4]    >      at 
> org.apache.lucene.index.RandomIndexWriter.getReader(RandomIndexWriter.java:332)
>    [junit4]    >      at 
> org.apache.lucene.index.BaseDocValuesFormatTestCase.doTestSortedSetVsStoredFields(BaseDocValuesFormatTestCase.java:2155)
>    [junit4]    >      at 
> org.apache.lucene.codecs.lucene70.TestLucene70DocValuesFormat.testSortedSetVariableLengthBigVsStoredFields(TestLucene70DocValuesFormat.java:93)
>    [junit4]    >      at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    [junit4]    >      at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>    [junit4]    >      at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    [junit4]    >      at 
> java.base/java.lang.reflect.Method.invoke(Method.java:564)
>    [junit4]    >      at java.base/java.lang.Thread.run(Thread.java:844)
> [...]
>    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): {}, 
> docValues:{}, maxPointsInLeafNode=693, maxMBSortInHeap=5.078503794479895, 
> sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@20a604e6),
>  locale=sw-TZ, timezone=Pacific/Pitcairn
>    [junit4]   2> NOTE: Linux 4.13.0-41-generic amd64/Oracle Corporation 9.0.4 
> (64-bit)/cpus=8,threads=1,free=352300304,total=518979584
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to