[ 
https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739893#comment-13739893
 ] 

Hoss Man commented on SOLR-5122:
--------------------------------

bq. So prior to random merges, the test naively assummed everything was on 1 
segment. Now with multiple, all bets are off and I don't think we can be 
estimating hits.

I'm not following you here -- why don't you think the basic approach to 
estimation can still work?

the only missing pieces seem to be that when an estimation is requested:
* docs *must* be collected in order -- a property that forces this behavior 
from EarlyTerminatingCollector.acceptsDocsOutOfOrder regardless of what the 
delegate cares about should do the trick.
* lastDocId needs to be absolute, not per-segment -- which could be done by 
tracking the reader offsets in EarlyTerminatingCollector.setNextReader and 
using that offset when assigning lastDocId in EarlyTerminatingCollector.collect

...and that should make it work as you previously designed it ... right?

                
> spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead 
> to "ArithmeticException: / by zero"
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5122
>                 URL: https://issues.apache.org/jira/browse/SOLR-5122
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.4
>            Reporter: Hoss Man
>            Assignee: James Dyer
>         Attachments: SOLR-5122.patch, SOLR-5122.patch
>
>
> As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, 
> and this (aparently) led to a failure in testEstimatedHitCounts.
> As far as i can tell: the test assumes that specific values would be returned 
> as the _estimated_ "hits" for a colleation, and it appears that the change in 
> MergePolicy however resulted in different segments with different term stats, 
> causing the estimation code to produce different values then what is expected.
> I made a quick attempt to improve the test to:
>  * expect explicit exact values only when spellcheck.collateMaxCollectDocs is 
> set such that the "estimate' should actually be exact (ie: 
> collateMaxCollectDocs  == 0 or collateMaxCollectDocs greater then the num 
> docs in the index
>  * randomize the values used for collateMaxCollectDocs and confirm that the 
> estimates are never more then the num docs in the index
> This lead to an odd "ArithmeticException: / by zero" error in the test, which 
> seems to suggest that there is a genuine bug in the code for estimating the 
> hits that only gets tickled in certain 
> mergepolicy/segment/collateMaxCollectDocs combinations.
> *Update:* This appears to be a general problem with collecting docs out of 
> order and the estimation of hits -- i believe even if there is no divide by 
> zero error, the estimates are largely meaningless since the docs are 
> collected out of order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to