[
https://issues.apache.org/jira/browse/SOLR-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539672#comment-14539672
]
Neil Ireson commented on SOLR-6803:
-----------------------------------
I also made the naive change of removed the offending line from the code, by
replacing
{code}
DocSet subset = getSubset(docs, sfield, fieldValue);
{code}
with
{code}
DocSet subset = null;
if ( subField != null || ((isShard || 0 < pivotCount) && !
statsFields.isEmpty()) ) {
subset = getSubset(docs, sfield, fieldValue);
}
{code}
Just to show that in this case the pivot still provides the best results.
| Values | Combined | Facet | Pivot |
| 100 | 202 | 133 | 67 |
| 1000 | 215 | 183 | 73 |
| 10000 | 255 | 392 | 145 |
| 100000 | 464 | 1301 | 395 |
| 500000 | 1307 | 4458 | 1179 |
| 1000000 | 2471 | 7783 | 2148 |
Note that with this change the code passed all the compile tests, so it's still
not clear why to me why getSubset has to be called every time.
> Pivot Performance
> -----------------
>
> Key: SOLR-6803
> URL: https://issues.apache.org/jira/browse/SOLR-6803
> Project: Solr
> Issue Type: Bug
> Affects Versions: 5.1
> Reporter: Neil Ireson
> Priority: Minor
> Attachments: PivotPerformanceTest.java
>
>
> I found that my pivot search for terms per day was taking an age so I knocked
> up a quick test, using a collection of 1 million documents with a different
> number of random terms and times, to compare different ways of getting the
> counts.
> 1) Combined = combining the term and time in a single field.
> 2) Facet = for each term set the query to the term and then get the time
> facet
> 3) Pivot = use the term/time pivot facet.
> The following two tables present the results for version 4.9.1 vs 4.10.1, as
> an average of five runs.
> 4.9.1 (Processing time in ms)
> |Values (#) | Combined (ms)| Facet (ms)| Pivot (ms)|
> |100 | 22| 21| 52|
> |1000 | 178| 57| 115|
> |10000 | 1363| 211| 310|
> |100000 | 2592| 1009| 978|
> |500000 | 3125| 3753| 2476|
> |1000000 | 3957| 6789| 3725|
> 4.10.1 (Processing time in ms)
> |Values (#) | Combined (ms)| Facet (ms)| Pivot (ms)|
> |100 | 21| 21| 75|
> |1000 | 188| 60| 265|
> |10000 | 1438| 215| 1826|
> |100000 | 2768| 1073| 16594|
> |500000 | 3266| 3686| 99682|
> |1000000 | 4080| 6777| 208873|
> The results show that, as the number of pivot values increases (i.e. number
> of terms * number of times), pivot performance in 4.10.1 get progressively
> worse.
> I tried to look at the code but there was a lot of changes in pivoting
> between 4.9 and 4.10, and so it is not clear to me what has cause the
> performance issues. However the results seem to indicate that if the pivot
> was simply a combined facet search, it could potentially produce better and
> more robust performance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]