Hi,
I think, it would be nice to configure Solr for the threshold checks before
doing the index replication. This would stop a bad index to be copied over
to the slaves which are ideally the ones serving the user requests.
In our case, we will have Solr Indexer which will index the documents.
Before starting the indexing process we disable the replication and then
index the documents. Then perform the threshold checks and if we have a
reasonable index then we enable the replication. So that the Solr Query
Engines will have a good index to server the user queries.
I have been thinking how it would be if we have this facility in Solr
(solrconfig.xml) by default for everyone.
We may have something like this inside the Replication Request Handler
section (either master can check before enabling replciation or slave can
check against the master before downloading the index, which ever is best,
I think better master does this check so that all the slaves need not check
for same thing against the master)
<lst name="thresholdchecks">
<str query="id:[* TO *]">100000</str>
<str query="id:[* TO *] AND type:movie">40000</str>
<str query="id:[* TO *] AND type:music">10000</str>
</lst>
I think, this is a very common task for people using Solr replication. I am
interested to work on this feature and commit the same. Before that I would
like to know your views on this feature. If this is something already
exists or coming up, please let me know!
Thanks & Regards,
Kranti K Parisa
http://www.linkedin.com/in/krantiparisa