[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

Michael Gibney (JIRA) Fri, 07 Apr 2017 08:46:56 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960982#comment-15960982
 ]


Michael Gibney edited comment on SOLR-8096 at 4/7/17 3:45 PM:
--------------------------------------------------------------

Of course I can't speak to the status of this issue from other folks' 
perspectives, but I did observe a couple of things that I wanted mention in 
case anyone else might find them useful. Performance has actually been 
acceptable for me, but implementing a simple cache for facets definitely 
improved performance (in my deployment) for common queries (see 
[^facetcache.diff]). A couple of observations:

1. Based on the fact that fields were being faceted using DocValues faceting, I 
assumed (incorrectly) that docValues must have been enabled. In fact, docValues 
are _not_ enabled by default; IndexReaders are wrapped in UninvertingReaders 
that (on demand) uninvert non-docValues-enabled fields in order to present a 
docValues-like interface for faceting. 
2. DocValues cannot yet be enabled on analyzed fields, so if you require this, 
you'll be dealing with the UninvertingReader; you may be interested in 
SOLR-8362.
3. {{DocValuesFacets}} iterates over all the documents in a result set for 
_every_ query. Regardless of the underlying implementation, this is bound to be 
relatively expensive for result sets containing large numbers of documents. 
Furthermore, "Result sets containing large numbers of documents" constitute a 
fairly large proportion of common user interactions (landing page with faceting 
over the whole index presents users with a handful of clickable top-level 
filters, each of which covers a large portion of the index). Thus, faceting 
seems to be a good candidate for caching, regardless of the underlying 
implementation of the DocValues interface. 

Accordingly, I've attached a stab at a patch ([^facetcache.diff]) to 
{{DocValuesFacets}} to support a cache intended to speed dv faceting over 
high-cardinality docsets. Combined with a handful of warming queries, I've seen 
much improved performance for common requests. In addition to the patch, you 
must configure your solrconfig.xml with, e.g., 
{code:xml}
<cache name="perSegFacetCache"
      class="solr.search.LRUCache"
      size="200"
      initialSize="200"
      autowarmCount="200"
      regenerator="solr.request.PerSegFacetCacheRegenerator" />
{code}
I tried to make the docset cardinality threshold for caching configurable at 
the field level, but haven't yet figured out how to pass in the configuration 
(you will see my unsuccessful attempts reflected in the changes to 
{{SimpleFacets}} -- with the patch in current state, if you want to adjust this 
parameter, it can only be done by changing the hardcoded default of 5000 (a 
reasonable value would probably be _much_ higher) for 
{{SimpleFacets.DEFAULT_PERSEG_FACET_CACHE_THRESHOLD}}).

Just to clarify, this comment is not a suggestion to skip closing this issue, 
and I'm sorry if it's a bit off-topic; I hope it strikes people as related 
enough to justify posting here. 


was (Author: mgibney):
Of course I can't speak to the status of this issue from other folks' 
perspectives, but I did observe a couple of things that I wanted mention in 
case anyone else might find them useful. Performance has actually been 
acceptable for me, but implementing a simple cache for facets definitely 
improved performance (in my deployment) for common queries (see 
[^facetcache.diff]). A couple of observations:
1. Based on the fact that fields were being faceted using DocValues faceting, I 
assumed (incorrectly) that docValues must have been enabled. In fact, docValues 
are _not_ enabled by default; IndexReaders are wrapped in UninvertingReaders 
that (on demand) uninvert non-docValues-enabled fields in order to present a 
docValues-like interface for faceting. 
2. DocValues cannot yet be enabled on analyzed fields, so if you require this, 
you'll be dealing with the UninvertingReader; you may be interested in 
SOLR-8362.
3. {{DocValuesFacets}} iterates over all the documents in a result set for 
_every_ query. Regardless of the underlying implementation, this is bound to be 
relatively expensive for result sets containing large numbers of documents. 
Furthermore, "Result sets containing large numbers of documents" constitute a 
fairly large proportion of common user interactions (landing page with faceting 
over the whole index presents users with a handful of clickable top-level 
filters, each of which covers a large portion of the index). Thus, faceting 
seems to be a good candidate for caching, regardless of the underlying 
implementation of the DocValues interface. 

Accordingly, I've attached a stab at a patch ([^facetcache.diff]) to 
{{DocValuesFacets}} to support a cache intended to speed dv faceting over 
high-cardinality docsets. Combined with a handful of warming queries, I've seen 
much improved performance for common requests. In addition to the patch, you 
must configure your solrconfig.xml with, e.g., 
{code:xml}
<cache name="perSegFacetCache"
      class="solr.search.LRUCache"
      size="200"
      initialSize="200"
      autowarmCount="200"
      regenerator="solr.request.PerSegFacetCacheRegenerator" />
{code}
I tried to make the docset cardinality threshold for caching configurable at 
the field level, but haven't yet figured out how to pass in the configuration 
(you will see my unsuccessful attempts reflected in the changes to 
{{SimpleFacets}} -- with the patch in current state, if you want to adjust this 
parameter, it can only be done by changing the hardcoded default of 5000 for 
{{SimpleFacets.DEFAULT_PERSEG_FACET_CACHE_THRESHOLD}}).
Just to clarify, this comment is not a suggestion to skip closing this issue, 
and I'm sorry if it's a bit off-topic; I hope it strikes people as related 
enough to justify posting here. 

> Major faceting performance regressions
> --------------------------------------
>
>                 Key: SOLR-8096
>                 URL: https://issues.apache.org/jira/browse/SOLR-8096
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>            Reporter: Yonik Seeley
>            Priority: Critical
>         Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time          
> ||...................................|| Percent of index being faceted
> ||num_unique_values|| 10%     || 50% || 90% ||
> |10           | 351.17%       | 1587.08%      | 3057.28% |
> |100          | 158.10%       | 203.61%       | 1421.93% |
> |1000 | 143.78%       | 168.01%       | 1325.87% |
> |10000        | 137.98%       | 175.31%       | 1233.97% |
> |100000       | 142.98%       | 159.42%       | 1252.45% |
> |1000000      | 255.15%       | 165.17%       | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-8096) Major faceting performance regressions

Reply via email to