Hello all,

In the example solrconfig.xml file for Solr 4.10.2 there is the comment
(appended below) that says that  setting checkIntegrityAtMerge to true
reduces risk of index corruption at the expense of slower merging.

Can someone please point me to any benchmarks or details about the
trade-offs?   What kind of a slowdown occurs and what are the factors
affecting the magnitude of the slowdown?

I have huge indexes with huge merges, so  I would really love to enable
integrity checking.  On the other hand, we have very rarely ever had a
problem with a corrupt index and we allways do checkIndexes  at the end of
the indexing process  when we are re-indexing the entire corpus.

I'd like to get some kind of understanding of how much this will cost us in
merge speeds since re-indexing our corpus takes about 10 days and much of
that time is spent on merging.

We index 13 millon books (nearly 4 billion pages) averaging about 100,000
tokens/book.  We now have about 1 millon books per shard.   Merging 30,000
volumes takes about  30 minutes, with larger merges taking longer.)


  <!--
        Use true to enable this safety check, which can help
        reduce the risk of propagating index corruption from older segments
        into new ones, at the expense of slower merging.
    -->
     <checkIntegrityAtMerge>false</checkIntegrityAtMerge>

Tom Burton-West
http://www.hathitrust.org/blogs/Large-scale-Search

Reply via email to