Re: Problem with keeping in sync Elasticsearch across two data centers

Ivan Beveridge Sat, 22 Feb 2014 13:11:29 -0800

Hi Amit,

It sounds like you need separate ES clusters (one per DC) and a way to
feed the data into them all consistently.


I happened to scan-read the tribenodes documentation - it looks like it
could work great for reads, but (IIRC) it will not do writes.

I suspect you want some message-passing system (eg RabbitMQ) or redis
(acting as a cache).

If you were writing (system) logs then something like logstash would
help interface. However, I suspect that is not the case so you would
need to find the integration solution (between message-passing / redis
and ES) that you need for your system.

This means that you could probably use something like tribenodes for the
reads and some message-passing/proxy system for writes.

Cheers

Ivan

On 22/02/2014 18:32, Amit Soni wrote:
> Hello Michael - Understand that ES is not built to maintain consistent
> cluster state across data centers. what I am wondering is whether there
> is a way for ElasticSearch to continue to replicate data onto a
> different data center (with some delay of course) so that when the
> primary center fails, the fail over data center still has most of the
> data (may be except for the last few seconds/minutes/hours).
> 
> Overall I am looking for a right way to implement cross data center
> deployment of elastic-search!
> 
> -Amit.
> 
> 
> On Fri, Feb 21, 2014 at 9:37 AM, Michael Sick
> <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Dario,
> 
>     I believe that you're looking for
>     TribeNodes 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/modules-tribe.html
> 
>     ES is not built to consistently cluster across DC's / larger network
>     lags. 
> 
>     On Fri, Feb 21, 2014 at 11:24 AM, Dario Rossi <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>         Hi, 
>         I've the following problem: our application publishes content to
>         an Elasticsearch cluster. We use local data less node for
>         querying elasticsearch then, so we don't use HTTP REST and the
>         local nodes are the loadbalancer. Now they came with the
>         requirement of having the cluster replicated to another data
>         center too (and in the future maybe another too... ) for
>         resilience. 
> 
>         At the very beginning we thought of having one large cluster
>         that goes across data centers (crazy). This solution has the
>         following problems:
> 
>         - The cluster has the split-brain problem (!)
>         - The client data less node will try to do requests across
>         different data centers (is there a solution to this???). I can't
>         find a way to avoid this. We don't want this to happen because
>         of a) latency and b) firewalling issues.
> 
>         So we started to think that this solution is not really viable.
>         So we thought of having one cluster per data center, which seems
>         more sensible. But then here we have the problem that we must
>         publish data to all clusters and, if one fails, we have no means
>         of rolling back (unless we try to set up a complicated version
>         based rollback system). I find this very complicated and hard to
>         maintain, although can be somewhat doable. 
> 
>         My biggest problem is that we have to keep the data centers in
>         the same state at any time, so that if one goes down, we can
>         readily switch to the other.
> 
>         Any ideas, or can you recommend some support to help use deal
>         with this?


-- 
Ivan Beveridge

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53091254.7020001%40livejournalinc.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Problem with keeping in sync Elasticsearch across two data centers

Reply via email to