[
https://issues.apache.org/jira/browse/SOLR-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085580#comment-14085580
]
Yonik Seeley commented on SOLR-6319:
------------------------------------
bq. the comment that you quoted explicitly says "Overrequesting can help a
little here, but not as much as when sorting by count" but then does no
overrequesting at all
Actually, a type of over-requesting is already built-in.
Say you have 10 shards and request the top 20.
If the stars align, one can get correct results by requesting 2 items per
shard. There is obviously a high percentage chance of errors (but it depends
on the data). As one requests more data per shard, the error chance decreases.
I'm not sure there's anything magic about "20", except for the fact that if
all results are on one shard then we are still OK. In the general case of data
being randomized across shards though, there doesn't seem to be anything
special about "20". So we request a total of 200 and select the top 20...
there's your built-in over-request.
And even if there were not a built-in over-request, just because "it can help a
little here" says nothing about whether it's worth the cost or not.
Looking at your example, I might be convinced of an over-request of the form of
"+10" or something to handle the very low limit cases, but I don't think we
should apply a multiplier by default, as is done with sort-by-count.
Anyway, if you are still asserting that lack of over-requesting *is* a bug...
please post a patch that attempts to fix things via over-requesting only, and
then I'll show you an example that still breaks :-)
> consider increasing over-request amount when sorting by index with mincount >
> 1
> -------------------------------------------------------------------------------
>
> Key: SOLR-6319
> URL: https://issues.apache.org/jira/browse/SOLR-6319
> Project: Solr
> Issue Type: Improvement
> Reporter: Hoss Man
> Assignee: Hoss Man
> Priority: Minor
>
> Discovered this while working on SOLR-2894. the logic for distributed
> faceting ignores over requesting (beyond the user specified facet.limit) if
> the facet.sort is index order -- but the rationale for doing this falls apart
> if the user has specified a facet.mincount > 1
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]