Re: [DISCUSS] Cross Data-Center Replication in Apache Solr

Erick Erickson Sun, 06 Dec 2020 05:46:58 -0800

Anshum:

I know I’ve been recommending something like this to clients for a while,
do you think a call to the community for people who’ve already put
something in the middle might net us some good info on the lurking
gremlins? Mind you “recommend” hasn’t actually involved me _doing_ it
so I don’t have any actual experience there…


But yeah, absolutely +1 for something making this easier for clients...

Erick

> On Dec 5, 2020, at 11:43 AM, Ilan Ginzburg <[email protected]> wrote:
> 
> That's an interesting initiative Anshum!
> 
> I can see at least two different approaches here, your mention of SolrJ seems 
> to hint at the first one:
> 1. Get the data as it comes from the client and fork it to local and remote 
> data centers,
> 2. Create (an asynchronous) stream replicating local data center data to 
> remote.
> 
> Option 1 is strongly consistent but adds latency and potentially blocking on 
> the critical path.
> Option 2 could look like remote PULL replicas, might have lower impact on the 
> local data center but has to deal with the remote data center always being 
> somewhat behind. If the client application can handle that, the performance 
> and efficiency gain (as well as simpler implementation? It doesn't require 
> another persistence layer) might be worth it...
> 
> Ilan
> 
> On Fri, Dec 4, 2020 at 5:24 PM Anshum Gupta <[email protected]> wrote:
> Hi everyone,
> 
> Large scale Solr installations often require cross data-center replication in 
> order to achieve data replication for both, access latency reasons as well as 
> disaster recovery. In the past users have either designed their own solutions 
> to deal with this or have tried to rely on the now-deprecated CDCR.
> 
> It would be really good to have support for cross data-center replication 
> within Solr, that is offered and supported by the community. This would allow 
> the effort around this shared problem to converge.
> 
> I’d like to propose a new solution based on my experiences at my day job. The 
> key points about this approach:
>       • Uses an external, configurable, messaging system in the middle for 
> actual replication/mirroring.
>       • We offer an abstraction and some default implementations based on 
> what we can support and what users really want. An example here would be 
> Kafka.
>       • This would be a separate repository allowing it to have its own 
> release cadence. We shouldn’t have to release this with every Solr release as 
> the overlap is just limited to SolrJ interactions.
> 
> I’ll share a more detailed and evolving document soon with the design for 
> everyone else to contribute to but wanted to share this as I’m starting to 
> work on this and wanted to avoid parallel efforts towards the same end-goal.
> 
> -- 
> Anshum Gupta


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Cross Data-Center Replication in Apache Solr

Reply via email to