That's an interesting initiative Anshum!

I can see at least two different approaches here, your mention of SolrJ
seems to hint at the first one:
1. Get the data as it comes from the client and fork it to local and remote
data centers,
2. Create (an asynchronous) stream replicating local data center data to
remote.

Option 1 is strongly consistent but adds latency and potentially blocking
on the critical path.
Option 2 could look like remote PULL replicas, might have lower impact on
the local data center but has to deal with the remote data center always
being somewhat behind. If the client application can handle that, the
performance and efficiency gain (as well as simpler implementation? It
doesn't require another persistence layer) might be worth it...

Ilan

On Fri, Dec 4, 2020 at 5:24 PM Anshum Gupta <[email protected]> wrote:

> Hi everyone,
>
>
> Large scale Solr installations often require cross data-center replication
> in order to achieve data replication for both, access latency reasons as
> well as disaster recovery. In the past users have either designed their own
> solutions to deal with this or have tried to rely on the now-deprecated
> CDCR.
>
>
> It would be really good to have support for cross data-center replication
> within Solr, that is offered and supported by the community. This would
> allow the effort around this shared problem to converge.
>
>
> I’d like to propose a new solution based on my experiences at my day job.
> The key points about this approach:
>
>    1. Uses an external, configurable, messaging system in the middle for
>    actual replication/mirroring.
>    2. We offer an abstraction and some default implementations based on
>    what we can support and what users really want. An example here would be
>    Kafka.
>    3. This would be a separate repository allowing it to have its own
>    release cadence. We shouldn’t have to release this with every Solr release
>    as the overlap is just limited to SolrJ interactions.
>
>
> I’ll share a more detailed and evolving document soon with the design for
> everyone else to contribute to but wanted to share this as I’m starting to
> work on this and wanted to avoid parallel efforts towards the same end-goal.
>
> --
> Anshum Gupta
>

Reply via email to