[jira] [Commented] (SOLR-6803) Pivot Performance

Neil Ireson (JIRA) Tue, 12 May 2015 04:30:07 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539672#comment-14539672
 ]


Neil Ireson commented on SOLR-6803:
-----------------------------------

I also made the naive change of removed the offending line from the code, by 
replacing

{code}
        DocSet subset = getSubset(docs, sfield, fieldValue);
{code}
with
{code}
        DocSet subset = null;
        if ( subField != null || ((isShard || 0 < pivotCount) && ! 
statsFields.isEmpty()) ) {
          subset = getSubset(docs, sfield, fieldValue);
        }
{code}
Just to show that in this case the pivot still provides the best results.

| Values     |  Combined |     Facet |     Pivot |
| 100        |       202 |       133 |        67 |
| 1000       |       215 |       183 |        73 |
| 10000      |       255 |       392 |       145 |
| 100000     |       464 |      1301 |       395 |
| 500000     |      1307 |      4458 |      1179 |
| 1000000    |      2471 |      7783 |      2148 |

Note that with this change the code passed all the compile tests, so it's still 
not clear why to me why getSubset has to be called every time. 


> Pivot Performance
> -----------------
>
>                 Key: SOLR-6803
>                 URL: https://issues.apache.org/jira/browse/SOLR-6803
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.1
>            Reporter: Neil Ireson
>            Priority: Minor
>         Attachments: PivotPerformanceTest.java
>
>
> I found that my pivot search for terms per day was taking an age so I knocked 
> up a quick test, using a collection of 1 million documents with a different 
> number of random terms and times, to compare different ways of getting the 
> counts.
> 1) Combined = combining the term and time in a single field.
> 2) Facet = for each term set the query to the term and then get the time 
> facet 
> 3) Pivot = use the term/time pivot facet.
> The following two tables present the results for version 4.9.1 vs 4.10.1, as 
> an average of five runs.
> 4.9.1 (Processing time in ms)
> |Values (#)   |  Combined (ms)|     Facet (ms)|     Pivot (ms)|
> |100       |        22|        21|        52|
> |1000      |       178|        57|       115|
> |10000     |      1363|       211|       310|
> |100000    |      2592|      1009|       978|
> |500000    |      3125|      3753|      2476|
> |1000000   |      3957|      6789|      3725|
> 4.10.1 (Processing time in ms)
> |Values (#)   |  Combined (ms)|     Facet (ms)|     Pivot (ms)|
> |100       |        21|        21|        75|
> |1000      |       188|        60|       265|
> |10000     |      1438|       215|      1826|
> |100000    |      2768|      1073|     16594|
> |500000    |      3266|      3686|     99682|
> |1000000   |      4080|      6777|    208873|
> The results show that, as the number of pivot values increases (i.e. number 
> of terms * number of times), pivot performance in 4.10.1 get progressively 
> worse.
> I tried to look at the code but there was a lot of changes in pivoting 
> between 4.9 and 4.10, and so it is not clear to me what has cause the 
> performance issues. However the results seem to indicate that if the pivot 
> was simply a combined facet search, it could potentially produce better and 
> more robust performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6803) Pivot Performance

Reply via email to