Hi,

I noticed on some of my Lucene indexes that they consist of both segments in 
compound format and segments in non-compound format.
There appears to be a pattern such that smaller segments (in terms of disk 
storage) are in compound format, and larger segments are non-compound.
However, in one index the largest compound segments are on the order of 10 
megabytes, in another they are all around 35 megabytes, and 50 megabytes for 
yet another, so the "limit" between the two seems to show some sort of scaling 
behavior. In each case, the non-compound segments are roughly a factor of 10 
larger.

I failed to find any explanation in the documentation. 
https://lucene.apache.org/core/6_1_0/core/org/apache/lucene/codecs/lucene60/package-summary.html
 ("When using the Compound File format (default in 1.4 and greater) ...") 
actually led me to assume there would be only either compound or non-compound 
segments, but not a mixture of both.

Is this behavior intentional? Is there an explanation digestible for 
non-experts? Can the behavior be modified somehow?
I'm asking both out of curiosity and concern as I'm involved in integrating 
Lucene into a larger server system whose many components have in the past 
occasionally hit file handle limits, so compound format ("for systems that 
frequently run out of file handles") seems like a Good Thing here.

Thanks in advance and best regards,
Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to