Re: Solr 7.6 optimize index size increase

Walter Underwood Wed, 17 Jun 2020 09:40:17 -0700

From that short description, you should not be running optimize at all.

Just stop doing it. It doesn’t make that big a difference.


It may take your indexes a few weeks to get back to a normal state after the 
forced merges.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 17, 2020, at 4:12 AM, Raveendra Yerraguntla 
> <raveend...@yahoo.com.INVALID> wrote:
> 
> Thank you David, Walt , Eric.
> 1. First time bloated index generated , there is no disk space issue. one 
> copy of index is 1/6 of disk capacity. we ran into disk capacity after more 
> than 2  copies of bloated copies.2. Solr is upgraded from 5.*. in 5.* more 
> than 5 segments is causing performance issue. Performance in 7.* is not 
> measured for increasing segments. I will plan a PT to get optimum number. 
> Application has incremental indexing multiple times in a work week.
> I will keep you updated on the resolution.
> Thanks again 
>    On Tuesday, June 16, 2020, 07:34:26 PM EDT, Erick Erickson 
> <erickerick...@gmail.com> wrote:  
> 
> It Depends (tm).
> 
> As of Solr 7.5, optimize is different. See: 
> https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
> 
> So, assuming you have _not_ specified maxSegments=1, any very large
> segment (near 5G) that has _zero_ deleted documents won’t be merged.
> 
> So there are two scenarios:
> 
> 1> What Walter mentioned. The optimize process runs out of disk space
>     and leaves lots of crud around
> 
> 2> your “older segments” are just max-sized segments with zero deletions.
> 
> 
> All that said… do you have demonstrable performance improvements after
> optimizing? The entire name “optimize” is misleading, of course who
> wouldn’t want an optimized index? In earlier versions of Solr (i.e. 4x),
> it made quite a difference. In more recent Solr releases, it’s not as clear
> cut. So before worrying about making optimize work, I’d recommend that
> you do some performance tests on optimized and un-optimized indexes. 
> If there are significant improvements, that’s one thing. Otherwise, it’s
> a waste.
> 
> Best,
> Erick
> 
>> On Jun 16, 2020, at 5:36 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>> 
>> For a full forced merge (mistakenly named “optimize”), the worst case disk 
>> space
>> is 3X the size of the index. It is common to need 2X the size of the index.
>> 
>> When I worked on Ultraseek Server 20+ years ago, it had the same merge 
>> behavior.
>> I implemented a disk space check that would refuse to merge if there wasn’t 
>> enough
>> free space. It would log an error and send an email to the admin.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Jun 16, 2020, at 1:58 PM, David Hastings <hastings.recurs...@gmail.com> 
>>> wrote:
>>> 
>>> I cant give you a 100% true answer but ive experienced this, and what
>>> "seemed" to happen to me was that the optimize would start, and that will
>>> drive the size up by 3 fold, and if you out of disk space in the process
>>> the optimize will quit since, it cant optimize, and leave the live index
>>> pieces in tact, so now you have the "current" index as well as the
>>> "optimized" fragments
>>> 
>>> i cant say for certain thats what you ran into, but we found that if you
>>> get an expanding disk it will keep growing and prevent this from happening,
>>> then the index will contract and the disk will shrink back to only what it
>>> needs.  saved me a lot of headaches not needing to ever worry about disk
>>> space
>>> 
>>> On Tue, Jun 16, 2020 at 4:43 PM Raveendra Yerraguntla
>>> <raveend...@yahoo.com.invalid> wrote:
>>> 
>>>> 
>>>> when optimize command is issued, the expectation after the completion of
>>>> optimization process is that the index size either decreases or at most
>>>> remain same. In solr 7.6 cluster with 50 plus shards, when optimize command
>>>> is issued, some of the shard's transient or older segment files are not
>>>> deleted. This is happening randomly across all shards. When unnoticed these
>>>> transient files makes disk full. Currently it is handled through monitors,
>>>> but question is what is causing the transient/older files remains there.
>>>> Are there any specific race conditions which laves the older files not
>>>> being deleted?
>>>> Any pointers around this will be helpful.
>>>> TIA
>>

Re: Solr 7.6 optimize index size increase

Reply via email to