Hi Mike,

thanks a lot for your explanation. I don't know by now whether I'll ever run 
into actual trouble with file handles, but just in case, it's good to know that 
setNoCFSRatio still allows for some tuning.

Thanks again.

Oliver




Von: Michael McCandless [mailto:luc...@mikemccandless.com] 
Gesendet: Mittwoch, 24. August 2016 11:09
An: Lucene Users <java-user@lucene.apache.org>; Oliver Kaleske 
<oliver.kale...@ptvgroup.com>
Betreff: Re: Some segments in compound format, others not

Hi Oliver,

The default behavior of Lucene (well, TieredMergePolicy, the default merge 
policy) is to *not* create a compound file for segments that are > 10% of the 
total index time at the moment that segment was written.

This way small segments don't use up many file descriptors, while large 
segments don't pay the (smallish) index-time cost of creating the compound file.

See MergePolicy.setNoCFSRatio to change this.


Mike McCandless

http://blog.mikemccandless.com

On Wed, Aug 24, 2016 at 4:10 AM, Oliver Kaleske <oliver.kale...@ptvgroup.com> 
wrote:
Hi,

I noticed on some of my Lucene indexes that they consist of both segments in 
compound format and segments in non-compound format.
There appears to be a pattern such that smaller segments (in terms of disk 
storage) are in compound format, and larger segments are non-compound.
However, in one index the largest compound segments are on the order of 10 
megabytes, in another they are all around 35 megabytes, and 50 megabytes for 
yet another, so the "limit" between the two seems to show some sort of scaling 
behavior. In each case, the non-compound segments are roughly a factor of 10 
larger.

I failed to find any explanation in the documentation. 
https://lucene.apache.org/core/6_1_0/core/org/apache/lucene/codecs/lucene60/package-summary.html
 ("When using the Compound File format (default in 1.4 and greater) ...") 
actually led me to assume there would be only either compound or non-compound 
segments, but not a mixture of both.

Is this behavior intentional? Is there an explanation digestible for 
non-experts? Can the behavior be modified somehow?
I'm asking both out of curiosity and concern as I'm involved in integrating 
Lucene into a larger server system whose many components have in the past 
occasionally hit file handle limits, so compound format ("for systems that 
frequently run out of file handles") seems like a Good Thing here.

Thanks in advance and best regards,
Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to