Hello all, In the example solrconfig.xml file for Solr 4.10.2 there is the comment (appended below) that says that setting checkIntegrityAtMerge to true reduces risk of index corruption at the expense of slower merging.
Can someone please point me to any benchmarks or details about the trade-offs? What kind of a slowdown occurs and what are the factors affecting the magnitude of the slowdown? I have huge indexes with huge merges, so I would really love to enable integrity checking. On the other hand, we have very rarely ever had a problem with a corrupt index and we allways do checkIndexes at the end of the indexing process when we are re-indexing the entire corpus. I'd like to get some kind of understanding of how much this will cost us in merge speeds since re-indexing our corpus takes about 10 days and much of that time is spent on merging. We index 13 millon books (nearly 4 billion pages) averaging about 100,000 tokens/book. We now have about 1 millon books per shard. Merging 30,000 volumes takes about 30 minutes, with larger merges taking longer.) <!-- Use true to enable this safety check, which can help reduce the risk of propagating index corruption from older segments into new ones, at the expense of slower merging. --> <checkIntegrityAtMerge>false</checkIntegrityAtMerge> Tom Burton-West http://www.hathitrust.org/blogs/Large-scale-Search