[ 
https://issues.apache.org/jira/browse/SOLR-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629832#comment-15629832
 ] 

James Dyer commented on SOLR-5344:
----------------------------------

ok, I think I know what's going here.  This feature is supposed to estimate hit 
counts for spelling corrections for cases where the client doesn't care about 
the exact # of hits, only that a partcular collation, if re-queried, would 
return something.  To gets estimates, you tell it the max # of documents you 
would like it to collect before quitting.  It then estimates how many hits it 
would have counted with this:

{noformat}
maximum-doc-id * number-of-docs-collected / (# visited docs + last-doc-id + 1)
{noformat}

In the failing test, we ask it to collect between 5 and 20 documents.  The 
max-doc-id is always 17 (there are 17 documents and no deletions).

But the denominator is controlled by the # of visited documents, and also the 
doc id of the one that happened to be visited last.  But in the face of 
randomized testing and release-specific index behavior, I think the best we can 
hope for is a worse-case scenario, between 2 and 15.  The actual correct value 
is 8.

So unless there are objections, I am going to relax the requirement of 6 <= 
hits <= 10 , and use 2 <= hits <= 15.  Maybe we could do better than this, but 
I would think anyone using this feature probably does not need to know more 
than whether or not hits can be produced, or the relative # between several 
collations returned.

> SpellCheckCollatorTest.testEstimatedHitCounts fails in jenkins from time to 
> time
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-5344
>                 URL: https://issues.apache.org/jira/browse/SOLR-5344
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: James Dyer
>
> Doesn't happen very often, but maybe one I can fix. I'll look into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to