[ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228492#comment-13228492
 ] 

James Dyer commented on SOLR-3240:
----------------------------------

collation.hits is just metadata for the user, so I think what you want to do 
would be entirely valid.  

The estimates would only be good if the hits are somewhat evenly distributed 
across the index, right?  For instance, if you're indexing something by topic 
and all and then a bunch of new docs get added on the same topic around the 
same time, you'd get a cluster of hits in one place.  

Even so, like you say, many (most) people would rather improve performance than 
have an accurate (any) hit count returned.

Beyond this, there are also some dead-simple optimizations we can make by 
simply removing any sorting & boosting parameters from the query before testing 
the collation.
                
> add spellcheck 'approximate collation count' mode
> -------------------------------------------------
>
>                 Key: SOLR-3240
>                 URL: https://issues.apache.org/jira/browse/SOLR-3240
>             Project: Solr
>          Issue Type: Improvement
>          Components: spellchecker
>            Reporter: Robert Muir
>
> SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
> will actually net results (taking into account context like filtering).
> In order to do this (from my understanding), it generates candidate queries,
> executes them, and saves the total hit count: collation.setHits(hits).
> For a large index it seems this might be doing too much work: in particular
> I'm interested in ensuring this feature can work fast enough/well for 
> autosuggesters.
> So I think we should offer an 'approximate' mode that uses an 
> early-terminating
> Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
> count based on docid space. 
> I'm not sure what needs to happen on the solr side (possibly support for 
> custom collectors?),
> but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to