[
https://issues.apache.org/jira/browse/SOLR-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297007#comment-15297007
]
Keith Laban commented on SOLR-9152:
-----------------------------------
The original concern in SOLR-8988 was that it would affect refinement. I can't
see a reason why it would, additionally in any of the testing I've done I've
seen only improvements.
> Change the default of facet.distrib.mco from false to true
> ----------------------------------------------------------
>
> Key: SOLR-9152
> URL: https://issues.apache.org/jira/browse/SOLR-9152
> Project: Solr
> Issue Type: Improvement
> Reporter: Dennis Gove
> Priority: Minor
>
> SOLR-8988 added a new query option facet.distrib.mco which when set to true
> would allow the use of facet.mincount=1 in cloud mode. The previous behavior,
> and current default, is that facet.mincount=0 when in cloud mode.
> h3. What exactly would be changed?
> The default of facet.distrib.mco=false would be changed to
> facet.distrib.mco=true.
> h3. When is this option effective?
> From the documentation,
> {code}
> /**
> * If we are returning facet field counts, are sorting those facets by their
> count, and the minimum count to return is > 0,
> * then allow the use of facet.mincount = 1 in cloud mode. To enable this use
> facet.distrib.mco=true.
> *
> * i.e. If the following three conditions are met in cloud mode:
> facet.sort=count, facet.limit > 0, facet.mincount > 0.
> * Then use facet.mincount=1.
> *
> * Previously and by default facet.mincount will be explicitly set to 0 when
> in cloud mode for this condition.
> * In SOLR-8599 and SOLR-8988, significant performance increase has been seen
> when enabling this optimization.
> *
> * Note: enabling this flag has no effect when the conditions above are not
> met. For those other cases the default behavior is sufficient.
> */
> {code}
> h3. What is the result of turning this option on?
> When facet.distrib.mco=true is used, and the conditions above are met, then
> when Solr is sending requests off to the various shards it will include
> facet.mincount=1. The result of this is that only terms with a count > 0 will
> be considered when processing the request for that shard. This can result in
> a significant performance gain when the field has high cardinality and the
> matching docset is relatively small because terms with 0 matches will not be
> considered.
> As shown in SOLR-8988, the runtime of a single query was reduced from 20
> seconds to less than 1 second.
> h3. Can this change result in worse performance?
> The current thinking is no, worse performance won't be experienced even under
> non-optimal scenarios. From the comments in SOLR-8988,
> {quote}
> Consider you asked for up to 10 terms from shardA with mincount=1 but you
> received only 5 terms back. In this case you know, definitively, that a term
> seen in the response from shardB but not in the response from shardA could
> have at most a count of 0 in shardA. If it had any other count in shardA then
> it would have been returned in the response from shardA.
> Also, if you asked for up to 10 terms from shardA with mincount=1 and you get
> back a response with 10 terms having a count >= 1 then the response is
> identical to the one you'd have received if mincount=0.
> Because of this, there isn't a scenario where the response would result in
> more work than would have been required if mincount=0. For this reason, the
> decrease in required work when mincount=1 is *always* either a moot point or
> a net win.
> {quote}
> The belief here is that it is safe to change the default of facet.distrib.mco
> such that facet.mincount=1 will be used when appropriate. The overall
> performance gain can be significant and there is no seen performance cost.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]