[
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149165#comment-16149165
]
Shawn Heisey commented on SOLR-8096:
------------------------------------
Discussing how we got here and who might be to blame is not something to do
here.
The fact is that current Solr versions have a major performance regression for
faceting, and probably for other things like grouping. In the last couple of
weeks, someone on the solr-user mailing list has encountered very slow results
with our most recent version (6.6.0 right now) compared to 4.x versions. For
them, enabling docValues, which is supposed to be the magic bullet for faceting
performance, causes performance to get even worse.
If I had any understanding of how this code worked and the precise reasons it
has become slower, I would be working on a solution. For those Solr committers
who *do* know that part of the code: Is there anything a user can do to speed
this up? Is there anything we can do in the Solr code to fix the regression?
Possibly insane idea: Can Solr leverage the faceting code in Lucene itself?
> Major faceting performance regressions
> --------------------------------------
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
> Issue Type: Bug
> Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
> Reporter: Yonik Seeley
> Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields
> over relatively static indexes was removed as part of LUCENE-5666, causing
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index,
> with each field having between 0 and 5 values per document. *Higher numbers
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time
> ||...................................|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10 | 351.17% | 1587.08% | 3057.28% |
> |100 | 158.10% | 203.61% | 1421.93% |
> |1000 | 143.78% | 168.01% | 1325.87% |
> |10000 | 137.98% | 175.31% | 1233.97% |
> |100000 | 142.98% | 159.42% | 1252.45% |
> |1000000 | 255.15% | 165.17% | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were
> faceted.
> One user who brought the performance problem to our attention:
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]