We've had some experience with this. As per usual, customers tend to
want the holy grail: no data loss when one data centre blows up, but no
increased latency when updating data. This then somehow, magically, has
to work over a slow uplink between two data centres without saturating
the link.

Currently we use NRT replicas across data centres. Which does add some
latency, but consistency is a bit more important for us. Overall, this
works pretty well.

The biggest problems we've experienced have all been related to
recovering replicas across a slow data centre uplink. A saturated link
can cause multiple replicas to lag behind, and when this gets too bad,
they too will go into recovery, and then shit really hits the fan.

I'm not sure whether there are any easy ways of improving that
behaviour. Limiting max bandwidth per solr instance during recovery?
Slow recovery is better than destructive recovery.

External tools like Kafka add a lot of operational overhead. One of the
great things about SolrCloud is how simple the whole replication setup is.



On 06/12/2020 14:46, Erick Erickson wrote:
>> I can see at least two different approaches here, your mention of SolrJ 
>> seems to hint at the first one:
>> 1. Get the data as it comes from the client and fork it to local and remote 
>> data centers,
>> 2. Create (an asynchronous) stream replicating local data center data to 
>> remote.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to