We've had some experience with this. As per usual, customers tend to want the holy grail: no data loss when one data centre blows up, but no increased latency when updating data. This then somehow, magically, has to work over a slow uplink between two data centres without saturating the link.
Currently we use NRT replicas across data centres. Which does add some latency, but consistency is a bit more important for us. Overall, this works pretty well. The biggest problems we've experienced have all been related to recovering replicas across a slow data centre uplink. A saturated link can cause multiple replicas to lag behind, and when this gets too bad, they too will go into recovery, and then shit really hits the fan. I'm not sure whether there are any easy ways of improving that behaviour. Limiting max bandwidth per solr instance during recovery? Slow recovery is better than destructive recovery. External tools like Kafka add a lot of operational overhead. One of the great things about SolrCloud is how simple the whole replication setup is. On 06/12/2020 14:46, Erick Erickson wrote: >> I can see at least two different approaches here, your mention of SolrJ >> seems to hint at the first one: >> 1. Get the data as it comes from the client and fork it to local and remote >> data centers, >> 2. Create (an asynchronous) stream replicating local data center data to >> remote. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
