Hi Oliver, The default behavior of Lucene (well, TieredMergePolicy, the default merge policy) is to *not* create a compound file for segments that are > 10% of the total index time at the moment that segment was written.
This way small segments don't use up many file descriptors, while large segments don't pay the (smallish) index-time cost of creating the compound file. See MergePolicy.setNoCFSRatio to change this. Mike McCandless http://blog.mikemccandless.com On Wed, Aug 24, 2016 at 4:10 AM, Oliver Kaleske <oliver.kale...@ptvgroup.com > wrote: > Hi, > > I noticed on some of my Lucene indexes that they consist of both segments > in compound format and segments in non-compound format. > There appears to be a pattern such that smaller segments (in terms of disk > storage) are in compound format, and larger segments are non-compound. > However, in one index the largest compound segments are on the order of 10 > megabytes, in another they are all around 35 megabytes, and 50 megabytes > for yet another, so the "limit" between the two seems to show some sort of > scaling behavior. In each case, the non-compound segments are roughly a > factor of 10 larger. > > I failed to find any explanation in the documentation. > https://lucene.apache.org/core/6_1_0/core/org/apache/ > lucene/codecs/lucene60/package-summary.html ("When using the Compound > File format (default in 1.4 and greater) ...") actually led me to assume > there would be only either compound or non-compound segments, but not a > mixture of both. > > Is this behavior intentional? Is there an explanation digestible for > non-experts? Can the behavior be modified somehow? > I'm asking both out of curiosity and concern as I'm involved in > integrating Lucene into a larger server system whose many components have > in the past occasionally hit file handle limits, so compound format ("for > systems that frequently run out of file handles") seems like a Good Thing > here. > > Thanks in advance and best regards, > Oliver > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >