[jira] [Updated] (SOLR-3240) add spellcheck 'approximate collation count' mode

James Dyer (JIRA) Wed, 13 Jun 2012 09:39:45 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Dyer updated SOLR-3240:
-----------------------------

    Attachment: SOLR-3240.patch

Ok.  I think I have a version here that will never compute scores, without 
having to write a lot of special code for it.

Best I can tell, when "collateMaxCollectDocs" is 0 or not specified, it will 
use the first inner-class Collector in SolrIndexSearcher#getDocListNC (this one 
is almost identical to TotalHitCountCollector).  Otherwise, it will use 
OneComparatorNonScoringCollector with the sort being on "<id>".  These queries 
will also make use of the Solr filter cache & query result cache when they can, 
etc.

The one thing is that the unit tests make it easy to determine if it is giving 
the estimate you'd expect, etc.  What I can't so easily test is if I turn off 
hit reporting entirely (collateExtendedResults=false), is it still picking a 
non-scoring collector.  I would like to add a test that does this but not so 
sure what the least-invasive approach would be.

I'm also thinking I can safely get rid of the "forceInorderCollection" flag 
because requesting docs sorted by doc-id would enforce the same thing, right?
                
> add spellcheck 'approximate collation count' mode
> -------------------------------------------------
>
>                 Key: SOLR-3240
>                 URL: https://issues.apache.org/jira/browse/SOLR-3240
>             Project: Solr
>          Issue Type: Improvement
>          Components: spellchecker
>            Reporter: Robert Muir
>         Attachments: SOLR-3240.patch, SOLR-3240.patch
>
>
> SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
> will actually net results (taking into account context like filtering).
> In order to do this (from my understanding), it generates candidate queries,
> executes them, and saves the total hit count: collation.setHits(hits).
> For a large index it seems this might be doing too much work: in particular
> I'm interested in ensuring this feature can work fast enough/well for 
> autosuggesters.
> So I think we should offer an 'approximate' mode that uses an 
> early-terminating
> Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
> count based on docid space. 
> I'm not sure what needs to happen on the solr side (possibly support for 
> custom collectors?),
> but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3240) add spellcheck 'approximate collation count' mode

Reply via email to