Avi, what is your <mergePolicyFactory> (or <mergePolicy> in Solr 5.x) settings in solrconfig.xml? It might be blank/unspecified but the behavior you're seeing suggests not.
Newly flushed segments: There is a boolean useCompoundFile setting (corresponding to an IndexWriter option) for newly flushed segments (only) which are either always CFS or always not depending on this setting. In Solr it's currently defaults to *not* do CFS, which I think is bad: https://issues.apache.org/jira/browse/SOLR-8997 Lucene defaults to CFS here. Merged segments: You're right Otis; it does depend on size. There are multiple ways to configure the thresholds. By default, a merged segment will not be CFS if it's size estimation exceeds 10% of the index (this is the "noCFSRatio" setting in Solr). There is also a "maxCFSSegmentSizeMB" but it's defaulted to not trigger. These and other settings on the Lucene MergePolicy are set via reflection from Solr using Java setter naming conventions. Therefore to see what you can set, go look at the setters at TieredMergePolicy (including inherited methods). On Mon, Apr 24, 2017 at 10:22 PM Otis Gospodnetić < [email protected]> wrote: > Hi Uwe, > > > For larger segments it will automatically create CFS files > > I was under the impression Lucene packed only smaller segments into CFS > files..... based on this 3 years old comment from Mike: > https://github.com/elastic/elasticsearch/issues/8919 . Maybe that > comment is out of date now? > > Thanks, > Otis > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > On Sun, Apr 23, 2017 at 11:40 AM, Uwe Schindler <[email protected]> wrote: > >> Hi Avi, >> >> There is nothing wrong with CFS files. They are just like zip files, >> containing multiple other index files. Sometimes, when you add only few >> documents, IndexWriter starts to merge several older segments to a new >> file. For larger segments it will automatically create CFS files, as those >> segments are unlikely to change. During merging it needs additional disk >> space. At end of merging it will delete old segments, unless they are used >> by older commit points or if Index searchers are referring to them. You >> should have at least 2 or 3 times the original index size on spare for >> indexes that change. Keep in mind, that e.g. on Windows where files in use >> cannot be deleted, you may see older segment for long time. >> As far as I know, depending on merge policy, Sole no longer defaults to >> not use CFS files. For large segments CFS files are better as they use less >> file handles. Smaller segments still use no compounds. So it is a matter of >> segment size by default, like in Lucene. >> >> Uwe >> >> >> Am 23. April 2017 11:50:17 MESZ schrieb Avi Steiner <[email protected] >> >: >>> >>> Hi >>> >>> >>> >>> We have a customer with Solr 5.3.1. >>> >>> The index contains less than 3.5 million docs, and index folder size is >>> about 240GB. >>> >>> I found that the huge files are .cfs files (compound files) that were >>> created lately although only few documents were added. >>> >>> The useCompoundFile parameter is commented in SolrConfig.xml. >>> >>> As far as I understand the default of Solr is false, and of Lucene is >>> true, which means this feature should be disabled. >>> >>> I would like to understand why those files created and why they are so >>> huge. >>> >>> >>> >>> Regards, >>> >>> >>> >>> Avi >>> >>> >>> >>> ------------------------------ >>> This email and any attachments thereto may contain private, >>> confidential, and privileged material for the sole use of the intended >>> recipient. Any review, copying, or distribution of this email (or any >>> attachments thereto) by others is strictly prohibited. If you are not the >>> intended recipient, please contact the sender immediately and permanently >>> delete the original and any copies of this email and any attachments >>> thereto. >> >> >> -- >> Uwe Schindler >> Achterdiek 19, 28357 Bremen >> https://www.thetaphi.de >> > > -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
