Hi!

Have there been any further explorations in the area of wan replication?

I have ES clusters in multiple datacenters connected via high-speed private 
network. I'm wondering if multi-master replication would be possible in 
this environment or if we'd need some type of 'shovel' plugin like the one 
described here to ship data between the DCs.

Thanks,
Matthew

On Tuesday, July 23, 2013 10:06:10 AM UTC-7, Jörg Prante wrote:
>
> Yes, I once examined Kafka, and discovered that many components are 
> already there in Elasticsearch. For example, the activity stream is already 
> there as ES translog (if you focus on indexing operations) and the ES 
> gateway is a useful persistency store mechanism. What I didn't like was the 
> single Kafka JVM, and the Zookeeper infrastructure, it is all adding up 
> complexity beside ES.
>
> For cross-cluster replication, I think the best approach is distributed 
> log replication. This is hard, because logged ES operations must be 
> synchronized by an external time source (e.g. vector clocks) to use them 
> like a global event stream. A pubsub mechanism could then work at the 
> primary shards of an index in the ES node as a service, merging the 
> translogs for an external agent who previously subscribed to the 
> replication stream. The vector clock is required for a distributed time 
> machine like behavior (snapshots), assuming the translog is not deleted, 
> but stored for a certain time window.
>
> Jörg
>
> On Tue, Jul 23, 2013 at 3:55 PM, Vinicius Carvalho <[email protected] 
> <javascript:>> wrote:
>
>> Thanks again Jorg, so that you know I'm actually considering using kafka 
>> for intra cluster replication. We want to push the index operations to a 
>> topic and then other clusters on different DCs would subscribe to this. 
>> Conflict resolution will be last commit will win. And in case of kafka 
>> cluster failure we will append changes to a local index, and then send them 
>> over as the bus is back. In the case ES cluster dies, and when it recovers, 
>> one nice thing on kafka is that one can request messages based on an 
>> offset, so we could start consuming messages from the last point the 
>> cluster had consume them.
>>
>> It's all ideas I'm working right now. I'll probably have time to start 
>> coding them soon. Thanks for all the support :)
>>
>> Cheers
>>
>> 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/37b1c902-a74d-4c35-bc41-5e1d5e76e72d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to