[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153967#comment-16153967
 ] 

Shawn Heisey commented on SOLR-8096:
------------------------------------

bq. I never used the so called optimize functionality so far and now realized 
that the index is completely rebuild which means e.g. duplication of disk 
space. Actually we can't do this because our infrastructure isn't designed for 
this.

I think the Solr reference guide may be missing one of the most critical 
recommendations with *any* Lucene-based software:  Always run with enough disk 
space so that your index can triple in size temporarily.  This recommendation 
is not just for running an optimize -- normal segment merging that happens 
during indexing can also double the size of the index temporarily.

There is only one scenario I know of that can actually triple the index size 
(temporarily).  It is a very specific scenario that may be uncommon in 
practice, but does happen in the wild.  Therefore perhaps the recommendation 
should be amended a little bit to read:  "Always run with enough disk space so 
your indexes can double in size temporarily, unless you frequently perform 
reindexes without deleting all the index data first, in which case you should 
allow for the index to triple in size temporarily."


> Major faceting performance regressions
> --------------------------------------
>
>                 Key: SOLR-8096
>                 URL: https://issues.apache.org/jira/browse/SOLR-8096
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>            Reporter: Yonik Seeley
>            Priority: Critical
>         Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time          
> ||...................................|| Percent of index being faceted
> ||num_unique_values|| 10%     || 50% || 90% ||
> |10           | 351.17%       | 1587.08%      | 3057.28% |
> |100          | 158.10%       | 203.61%       | 1421.93% |
> |1000 | 143.78%       | 168.01%       | 1325.87% |
> |10000        | 137.98%       | 175.31%       | 1233.97% |
> |100000       | 142.98%       | 159.42%       | 1252.45% |
> |1000000      | 255.15%       | 165.17%       | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to