[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150525#comment-16150525
 ] 

Ere Maijala commented on SOLR-8096:
-----------------------------------

Chiming in as one of those affected by performance issues with faceting. I've 
been testing with a 57 million record index of bibliographic data. A faceting 
request that used to take around 20ms in Solr 4.10.2 is at least 2600ms in Solr 
6.6.0. While in general I find it fine to change the default behavior to 
something that works better than before for a majority of use cases, there 
should be a way to maintain performance in other cases. 

My main issue at the moment is that even facet.method=uif is slow if you 
request more than a few items. In a smaller test index of 6 million records I 
can get the top 20 results in 4ms, but facet.limit=200 takes ~100ms and 
facet.limit=2000 takes ~1300ms (the facet has 1960 buckets). Params user for 
the query:

q=*:*&rows=0&facet=true&facet.field=building&facet.mincount=1&facet.limit=[20-2000]&debugQuery=true&facet.method=uif
 

Anyway, here's a list of issues that, for me, seem to be contribute to all the 
confusion around faceting performance:

# As far as I can see, facet.method=uif is completely undocumented apart from a 
short entry in release notes.
# Also undocumented is the fact (as observed during testing) that docValues 
must not be enabled for facet.method=uif to do any good. Otherwise the 
performance can be even worse than with FC.
# There's no proper documentation on what the introduction of docValues means 
in practice. There are several articles about what good it brings but I 
couldn't find much of analysis on any possible downsides.
# facet.method=uif with Solr 6.6.0 is still very slow compared to that in Solr 
4.10.2 if you request more than a few entries.
# There was no way to get back UIF before SOLR-8466.
# Changes in behavior haven't really been documented. This is how the 
introduction of docValues was documented in the release notes of Solr 4.2.0: 
"SOLR-3855, SOLR-4490: Doc values support". That doesn't help a poor developer 
like me to get the big picture. Then I read in 
https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/ that compared 
to what we used to have _"DocValues aim to alleviate both of these problems 
while keeping performance comparable."_ Of course that's just something I read 
on internet, but so far it's the best description of docValues I've read and 
makes it sound like there won't be significant performance differences.
# It should be possible to make an informed decision to go with something that 
uses more JVM memory and is slower to warm up if required by the use-case. This 
is difficult because information is so scattered and the Solr reference guide 
doesn't go into much detail. For instance the effect of docValues is not 
mentioned in the reference guide where facet.method is described.
# Solr'd documentation on DocValues 
(https://lucene.apache.org/solr/guide/6_6/docvalues.html) highlights the 
positive effects it has on performance, memory consumption etc. It starts with 
_"DocValues are a way of recording field values internally that is more 
efficient for some purposes, such as sorting and faceting, than traditional 
indexing."_ That sounds like something you should enable as quickly as possible 
to reap the benefits!
# Discussions about docValues in solr-user list also mostly recomment enabling 
docValues without discussing any caveats.

> Major faceting performance regressions
> --------------------------------------
>
>                 Key: SOLR-8096
>                 URL: https://issues.apache.org/jira/browse/SOLR-8096
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>            Reporter: Yonik Seeley
>            Priority: Critical
>         Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time          
> ||...................................|| Percent of index being faceted
> ||num_unique_values|| 10%     || 50% || 90% ||
> |10           | 351.17%       | 1587.08%      | 3057.28% |
> |100          | 158.10%       | 203.61%       | 1421.93% |
> |1000 | 143.78%       | 168.01%       | 1325.87% |
> |10000        | 137.98%       | 175.31%       | 1233.97% |
> |100000       | 142.98%       | 159.42%       | 1252.45% |
> |1000000      | 255.15%       | 165.17%       | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to