[
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153967#comment-16153967
]
Shawn Heisey commented on SOLR-8096:
------------------------------------
bq. I never used the so called optimize functionality so far and now realized
that the index is completely rebuild which means e.g. duplication of disk
space. Actually we can't do this because our infrastructure isn't designed for
this.
I think the Solr reference guide may be missing one of the most critical
recommendations with *any* Lucene-based software: Always run with enough disk
space so that your index can triple in size temporarily. This recommendation
is not just for running an optimize -- normal segment merging that happens
during indexing can also double the size of the index temporarily.
There is only one scenario I know of that can actually triple the index size
(temporarily). It is a very specific scenario that may be uncommon in
practice, but does happen in the wild. Therefore perhaps the recommendation
should be amended a little bit to read: "Always run with enough disk space so
your indexes can double in size temporarily, unless you frequently perform
reindexes without deleting all the index data first, in which case you should
allow for the index to triple in size temporarily."
> Major faceting performance regressions
> --------------------------------------
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
> Issue Type: Bug
> Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
> Reporter: Yonik Seeley
> Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields
> over relatively static indexes was removed as part of LUCENE-5666, causing
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index,
> with each field having between 0 and 5 values per document. *Higher numbers
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time
> ||...................................|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10 | 351.17% | 1587.08% | 3057.28% |
> |100 | 158.10% | 203.61% | 1421.93% |
> |1000 | 143.78% | 168.01% | 1325.87% |
> |10000 | 137.98% | 175.31% | 1233.97% |
> |100000 | 142.98% | 159.42% | 1252.45% |
> |1000000 | 255.15% | 165.17% | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were
> faceted.
> One user who brought the performance problem to our attention:
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]