[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150525#comment-16150525 ]
Ere Maijala commented on SOLR-8096: ----------------------------------- Chiming in as one of those affected by performance issues with faceting. I've been testing with a 57 million record index of bibliographic data. A faceting request that used to take around 20ms in Solr 4.10.2 is at least 2600ms in Solr 6.6.0. While in general I find it fine to change the default behavior to something that works better than before for a majority of use cases, there should be a way to maintain performance in other cases. My main issue at the moment is that even facet.method=uif is slow if you request more than a few items. In a smaller test index of 6 million records I can get the top 20 results in 4ms, but facet.limit=200 takes ~100ms and facet.limit=2000 takes ~1300ms (the facet has 1960 buckets). Params user for the query: q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=[20-2000]&debugQuery=true&facet.method=uif Anyway, here's a list of issues that, for me, seem to be contribute to all the confusion around faceting performance: # As far as I can see, facet.method=uif is completely undocumented apart from a short entry in release notes. # Also undocumented is the fact (as observed during testing) that docValues must not be enabled for facet.method=uif to do any good. Otherwise the performance can be even worse than with FC. # There's no proper documentation on what the introduction of docValues means in practice. There are several articles about what good it brings but I couldn't find much of analysis on any possible downsides. # facet.method=uif with Solr 6.6.0 is still very slow compared to that in Solr 4.10.2 if you request more than a few entries. # There was no way to get back UIF before SOLR-8466. # Changes in behavior haven't really been documented. This is how the introduction of docValues was documented in the release notes of Solr 4.2.0: "SOLR-3855, SOLR-4490: Doc values support". That doesn't help a poor developer like me to get the big picture. Then I read in https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/ that compared to what we used to have _"DocValues aim to alleviate both of these problems while keeping performance comparable."_ Of course that's just something I read on internet, but so far it's the best description of docValues I've read and makes it sound like there won't be significant performance differences. # It should be possible to make an informed decision to go with something that uses more JVM memory and is slower to warm up if required by the use-case. This is difficult because information is so scattered and the Solr reference guide doesn't go into much detail. For instance the effect of docValues is not mentioned in the reference guide where facet.method is described. # Solr'd documentation on DocValues (https://lucene.apache.org/solr/guide/6_6/docvalues.html) highlights the positive effects it has on performance, memory consumption etc. It starts with _"DocValues are a way of recording field values internally that is more efficient for some purposes, such as sorting and faceting, than traditional indexing."_ That sounds like something you should enable as quickly as possible to reap the benefits! # Discussions about docValues in solr-user list also mostly recomment enabling docValues without discussing any caveats. > Major faceting performance regressions > -------------------------------------- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug > Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 > Reporter: Yonik Seeley > Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...................................|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |10000 | 137.98% | 175.31% | 1233.97% | > |100000 | 142.98% | 159.42% | 1252.45% | > |1000000 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org