[
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16152866#comment-16152866
]
Guenter Hipler commented on SOLR-8096:
--------------------------------------
I run a lot of tests in the last days (partially I could use old archived
queries from our productive system based on 4.10 together with the original
query times so I was able to compare the processing times) My findings:
* using uif method for multifield valued fields but without docvalues (this
doesn't work at all) seems to solve most of our current use - cases
* [~emaijala] >> Trying to use facet.method=uif with a solr.DateRangeField
causes the following exception: <<
we use only Int types for publishing dates - this works for range facets.
Perhaps a possibility for you?
* all our disks are SSD based - the index is not cached in memory, this
wouldn't be possible for us with an 110G index
* So in general I think our overdue update from version 4.10 to 6.x now might
be an option
* the use case described by [~emaijala] where facet buckets > 200 are causing a
performance penal is from my point of view not very often - so I guess/hope we
can live with this
* But I have a great concern:
I think it's problematic if we have to run an aggressive policy for merging
segments quite often because it's really resource intensive
* my question:
[[email protected]] Yonik, do you have an idea/plan how to unify (to bring
together) the diverged developments in the Lucene area (docvalues) with the
current Solr facet algorithms? I think it's no option to make only some
optimizations here and there at least in the medium-term view
I would be happy to support this process with hints and metrics from the user
side
Günter
> Major faceting performance regressions
> --------------------------------------
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
> Issue Type: Bug
> Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
> Reporter: Yonik Seeley
> Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields
> over relatively static indexes was removed as part of LUCENE-5666, causing
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index,
> with each field having between 0 and 5 values per document. *Higher numbers
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time
> ||...................................|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10 | 351.17% | 1587.08% | 3057.28% |
> |100 | 158.10% | 203.61% | 1421.93% |
> |1000 | 143.78% | 168.01% | 1325.87% |
> |10000 | 137.98% | 175.31% | 1233.97% |
> |100000 | 142.98% | 159.42% | 1252.45% |
> |1000000 | 255.15% | 165.17% | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were
> faceted.
> One user who brought the performance problem to our attention:
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]