[ 
https://issues.apache.org/jira/browse/SOLR-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Winch updated SOLR-8236:
----------------------------
    Description: 
This issue describes a search component for estimating numFounds in federated 
search - that is, distributed search over documents stored in separated 
instances of SOLR (for example, one server per continent), where a single 
document (identified by an agreed, common unique id) may be stored in more than 
one server instance, with (possibly) differing fields and data.

When documents are present on more than one distributed server, which is 
normally the case in the federated search situation, then the numFound reported 
by the search is incorrect. For small result sets we may return all the 
document ids matching the query from each server, in order to compute an exact 
numFound. For large result sets this is impractical, and the numFound may be 
estimated using statistical techniques.

Statistical techniques may be driven by the following heuristic: if two shards 
always return the same numFound for queries, then they contain the same 
document ids, and the combined numFound is the same as for each. On the other 
hand, if two shards always return different numFounds for queries, then they 
likely contain independent document ids, and the numFounds should be summed.

This issue combines with others to provide full federated search support. See 
also SOLR-8234 and SOLR-8235.

–

Note that this is part of a new implementation of federated search as opposed 
to the older issues SOLR-3799 through SOLR-3805.

> Federated Search (new) - NumFound
> ---------------------------------
>
>                 Key: SOLR-8236
>                 URL: https://issues.apache.org/jira/browse/SOLR-8236
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tom Winch
>            Priority: Minor
>
> This issue describes a search component for estimating numFounds in federated 
> search - that is, distributed search over documents stored in separated 
> instances of SOLR (for example, one server per continent), where a single 
> document (identified by an agreed, common unique id) may be stored in more 
> than one server instance, with (possibly) differing fields and data.
> When documents are present on more than one distributed server, which is 
> normally the case in the federated search situation, then the numFound 
> reported by the search is incorrect. For small result sets we may return all 
> the document ids matching the query from each server, in order to compute an 
> exact numFound. For large result sets this is impractical, and the numFound 
> may be estimated using statistical techniques.
> Statistical techniques may be driven by the following heuristic: if two 
> shards always return the same numFound for queries, then they contain the 
> same document ids, and the combined numFound is the same as for each. On the 
> other hand, if two shards always return different numFounds for queries, then 
> they likely contain independent document ids, and the numFounds should be 
> summed.
> This issue combines with others to provide full federated search support. See 
> also SOLR-8234 and SOLR-8235.
> –
> Note that this is part of a new implementation of federated search as opposed 
> to the older issues SOLR-3799 through SOLR-3805.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to