[ 
https://issues.apache.org/jira/browse/SOLR-11711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman updated SOLR-11711:
----------------------------------
    Description: 
Currently while sending pivot facet requests to each shard, the 
{{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with a 
specified limit > 0. However with a mincount of 0, the pivot facet will use 
exponentially more wasted memory for every pivot field added. This is because 
there will be a total of {{limit^(# of pivots)}} pivot values created in 
memory, even though the vast majority of them will have counts of 0, and are 
therefore useless.

Imagine the scenario of a pivot facet with 3 levels, and {{facet.limit=1000}}. 
There will be a billion pivot values created, and there will almost definitely 
be nowhere near a billion pivot values with counts > 0.

This likely due to the reasoning mentioned in [this comment in the original 
distributed pivot facet 
ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898].
 Basically it was thought that the refinement code would need to know that a 
count was 0 for a shard so that a refinement request wasn't sent to that shard. 
However this is checked in the code, [in this part of the refinement candidate 
checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275].
 Therefore if the {{pivot.mincount}} was set to 1, the non-existent values 
would either:
* Not be known, because the {{facet.limit}} was smaller than the number of 
facet values with positive counts. This isn't an issue, because they wouldn't 
have been returned with {{pivot.mincount}} set to 0.
* Would be known, because the {{facet.limit}} would be larger than the number 
of facet values returned. therefore this conditional would return false (since 
we are only talking about pivot facets sorted by count).

The solution, is to use the same pivot mincount as would be used if no limit 
was specified. 

This also relates to a similar problem in field faceting that was "fixed" in 
[SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The solution 
was to add a flag, {{facet.distrib.mco}}, which would enable not choosing a 
mincount of 0 when unnessesary. Since this flag can only increase performance, 
and doesn't break any queries I have removed it as an option and replaced the 
code to use the feature always.

  was:
Currently while sending pivot facet requests to each shard, the 
{{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with a 
specified limit > 0. However with a mincount of 0, the pivot facet will use 
exponentially more wasted memory for every pivot field added. This is because 
there will be a total of {{limit^(# of pivots)}} pivot values created in 
memory, even though the vast majority of them will have counts of 0, and are 
therefore useless.

Imagine the scenario of a pivot facet with 3 levels, and `facet.limit=1000`. 
There will be a billion pivot values created, and there will almost definitely 
be nowhere near a billion pivot values with counts > 0.

This likely due to the reasoning mentioned in [this comment in the original 
distributed pivot facet 
ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898].
 Basically it was thought that the refinement code would need to know that a 
count was 0 for a shard so that a refinement request wasn't sent to that shard. 
However this is checked in the code, [in this part of the refinement candidate 
checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275].
 Therefore if the {{pivot.mincount}} was set to 1, the non-existent values 
would either:
* Not be known, because the {{facet.limit}} was smaller than the number of 
facet values with positive counts. This isn't an issue, because they wouldn't 
have been returned with {{pivot.mincount}} set to 0.
* Would be known, because the {{facet.limit}} would be larger than the number 
of facet values returned. therefore this conditional would return false (since 
we are only talking about pivot facets sorted by count).

The solution, is to use the same pivot mincount as would be used if no limit 
was specified. 

This also relates to a similar problem in field faceting that was "fixed" in 
[SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The solution 
was to add a flag, {{facet.distrib.mco}}, which would enable not choosing a 
mincount of 0 when unnessesary. Since this flag can only increase performance, 
and doesn't break any queries I have removed it as an option and replaced the 
code to use the feature always.


> Improve memory usage of pivot facets
> ------------------------------------
>
>                 Key: SOLR-11711
>                 URL: https://issues.apache.org/jira/browse/SOLR-11711
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: faceting
>    Affects Versions: master (8.0)
>            Reporter: Houston Putman
>              Labels: pull-request-available
>             Fix For: 5.6, 6.7, 7.2
>
>
> Currently while sending pivot facet requests to each shard, the 
> {{facet.pivot.mincount}} is set to {{0}} if the facet is sorted by count with 
> a specified limit > 0. However with a mincount of 0, the pivot facet will use 
> exponentially more wasted memory for every pivot field added. This is because 
> there will be a total of {{limit^(# of pivots)}} pivot values created in 
> memory, even though the vast majority of them will have counts of 0, and are 
> therefore useless.
> Imagine the scenario of a pivot facet with 3 levels, and 
> {{facet.limit=1000}}. There will be a billion pivot values created, and there 
> will almost definitely be nowhere near a billion pivot values with counts > 0.
> This likely due to the reasoning mentioned in [this comment in the original 
> distributed pivot facet 
> ticket|https://issues.apache.org/jira/browse/SOLR-2894?focusedCommentId=13979898].
>  Basically it was thought that the refinement code would need to know that a 
> count was 0 for a shard so that a refinement request wasn't sent to that 
> shard. However this is checked in the code, [in this part of the refinement 
> candidate 
> checking|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.1.0/solr/core/src/java/org/apache/solr/handler/component/PivotFacetField.java#L275].
>  Therefore if the {{pivot.mincount}} was set to 1, the non-existent values 
> would either:
> * Not be known, because the {{facet.limit}} was smaller than the number of 
> facet values with positive counts. This isn't an issue, because they wouldn't 
> have been returned with {{pivot.mincount}} set to 0.
> * Would be known, because the {{facet.limit}} would be larger than the number 
> of facet values returned. therefore this conditional would return false 
> (since we are only talking about pivot facets sorted by count).
> The solution, is to use the same pivot mincount as would be used if no limit 
> was specified. 
> This also relates to a similar problem in field faceting that was "fixed" in 
> [SOLR-8988|https://issues.apache.org/jira/browse/SOLR-8988#13324]. The 
> solution was to add a flag, {{facet.distrib.mco}}, which would enable not 
> choosing a mincount of 0 when unnessesary. Since this flag can only increase 
> performance, and doesn't break any queries I have removed it as an option and 
> replaced the code to use the feature always.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to