To be clear, you're planning on building a single cluster (i.e. all
machines share the exact same configuration files, security database,
etc.) that spans multiple physical locations on different continents? I
had thought you were talking about database replication before, which
was built to replicate between two different clusters (i.e. different
config files, security databases, etc. for each cluster).
I would recommend engaging our professional services if you're thinking
of going down that road. I don't know of anyone who has built such a
beast, and there are real drawbacks I can think of off the top of my head.
1. Host failover will not happen unless a host is part of a surviving
quorum of N/2+1 hosts. If you have two datacenters with the exact
same number of hosts in each, neither can tell if it is part of a
quorum so no failover would happen. You would need at least one host
somewhere else, say in a 3rd datacenter, to achieve a quorum if a
data center goes down.
2. Local-disk replication is synchronous, which I expect would greatly
slow your update rate due to waiting around for a reply from a far
away cluster. No prepare or commit operation would proceed until
there has been a complete intercontinental round trip.
3. Queries would always be to whichever forest is currently open as a
master. If you have a master plus two replicas, the ordering in the
config file controls which forest would open first. So for example,
ForestA-US-Master would first list ForestA-US-Replica and then
ForestA-UK-Replica if you want the US replica to take over before
the UK replica.
4. If you lose connectivity between continents, whichever data center
hosts your security database would survive, but the other data
center(s) would go offline due to the security database being
unavailable.
I think you'd have a more resilient system if you have a separate
cluster in each data center, and use database replication between them.
You can replicate your security database from one cluster to the other
if you like, and the replica cluster will query it at a slight lag.
Wayne.
On 07/19/2012 11:28 AM, Danny Sinang wrote:
Hi Wayne,
I was planning to use local-disk failover also between the US and EU
servers.
Question is, since all US and EU servers will be in the same cluster,
which failover host will automatically take over first ?
I'm hoping a US-based failover host will take over first if a US-based
forest goes down.
Regards,
Danny
On Thu, Jul 19, 2012 at 1:52 PM, Wayne Feick
<[email protected] <mailto:[email protected]>> wrote:
With local-disk failover, replica forests will automatically take
over when hosts in the same cluster fail.
With database replication, there is no automatic failover when the
entire master cluster goes down. If you want automatic failover,
you have to build that up yourself since it involves more than
just your MarkLogic instance (i.e. a failed data center likely
means non-MarkLogic servers are also failing over).
The documentation discusses the issues you need to take into
consideration, but I'll provide a quick overview here.
Database replication is performed independently and asynchronously
between each master/replica forest pair (as opposed to
synchronously for local-disk failover that you are also using). As
a result, when an active master cluster fails you potentially have
a situation where one participant forest has replicated its
portion of a committed transaction but another participant forest
has not.
MarkLogic server runs queries at a particular point in time (i.e
commit timestamp), and when running against a replica database it
automatically determines the most current timestamp a query can
use based on how up to date each forest is. Queries see a
consistent replica database at a slight lag behind the master
database.
If you're just going to query the replica database when the master
cluster goes down, you don't need to do anything. The server will
automatically run queries at the most recent commit timestamp it
can until the master cluster becomes available again.
If you decide to do updates on the replica database, you run into
the issue of partially replicated transactions. You can use
xdmp:forest-status() to see how up to date each forest is, choose
the minimum commit timestamp across all forests, deconfigure
replication on the replica database, and do a rollback to that
timestamp. This ensures all forests in the database are consistent
to that point in time, but potentially drops a few transactions
that had been committed on the master but not yet fully replicated.
Later, when your failed master database comes back up, you
configure it as a replica of the database you promoted to master
and just the differences will be replicated back.
Wayne.
On 07/19/2012 09:54 AM, Danny Sinang wrote:
Hi Wayne,
If I am replicating the forests of US1 to US2 and EU2, what
happens when US1 goes down ?
Which server will take over and server the US1 forests ? US2 or
EU2 ?
Regards,
Danny
On Wed, Jul 18, 2012 at 7:16 PM, Wayne Feick
<[email protected] <mailto:[email protected]>> wrote:
Yes, this should work fine.
You'll have a master US-Database in the US, and a master
UK-Database in the UK, with each replicating to the site.
Local-disk failover works fine at both ends for both master
and replica databases.
Extending to a third site should also be fine, just keep in
mind that a network brown out to either of the replica sites
will degrade your foreground performance in order to enforce
the lag limit.
Wayne.
On 07/18/2012 12:40 PM, Danny Sinang wrote:
Hi,
We currently have a 3-node ML cluster here in the US (let's
call them US1, US2, US3), with forest replication and
failover enabled.
Should we need to expand to Europe, would the setup below
achieve :
1. Traffic Localization (during normal operations)
2. Continued ML availability (in the event we ever need
to bring one cluster down for hardware or software
upgrades / fixes)
?
*Draft EU Expansion Plan*
1. Set up another 3-node ML cluster in Europe (EU1, EU2,
EU3), with forest replication and failover enabled.
2. Also replicate the forests of the US cluster to the
EU clusters and vice versa
3. Direct US customers (via some geo DNS) to US
webservers which use US1, US2, and US3
- This would save US customer data to the
forests on US1, US2, and US3
4. Direct EU customers to EU webservers which use EU1,
EU2, and EU3
- This would save EU customer data to the
forests on EU1, EU2, and EU3
5. Should the US ML Cluster ever go down, point the US
websevers to the EU ML Cluster
- I'm hoping this would activate and make
available the data from US1, US2, and US3 on EU1, EU2,
and EU3. Am I correct ?
- Same thing should happen the other way around
(i.e. if EU ML cluster goes down, point EU webservers to
US ML cluster)
Do you think this would work ?
Is there a better way to achieve our goals ?
How do we extend this model should the time come for us to
expand to Asia ?
Regards,
Danny
--
Wayne Feick
Principal Engineer
MarkLogic Corporation
[email protected] <mailto:[email protected]>
Phone:+1 650 655 2378 <tel:%2B1%20650%20655%202378>
www.marklogic.com <http://www.marklogic.com>
This e-mail and any accompanying attachments are confidential. The
information is intended solely for the use of the individual to whom it is
addressed. Any review, disclosure, copying, distribution, or use of this e-mail
communication by others is strictly prohibited. If you are not the intended
recipient, please notify us immediately by returning this message to the sender
and delete all copies. Thank you for your cooperation.
_______________________________________________
General mailing list
[email protected]
<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
--
Wayne Feick
Principal Engineer
MarkLogic Corporation
[email protected] <mailto:[email protected]>
Phone:+1 650 655 2378 <tel:%2B1%20650%20655%202378>
www.marklogic.com <http://www.marklogic.com>
This e-mail and any accompanying attachments are confidential. The
information is intended solely for the use of the individual to whom it is
addressed. Any review, disclosure, copying, distribution, or use of this e-mail
communication by others is strictly prohibited. If you are not the intended
recipient, please notify us immediately by returning this message to the sender
and delete all copies. Thank you for your cooperation.
_______________________________________________
General mailing list
[email protected]
<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
--
Wayne Feick
Principal Engineer
MarkLogic Corporation
[email protected]
Phone: +1 650 655 2378
www.marklogic.com
This e-mail and any accompanying attachments are confidential. The information
is intended solely for the use of the individual to whom it is addressed. Any
review, disclosure, copying, distribution, or use of this e-mail communication
by others is strictly prohibited. If you are not the intended recipient, please
notify us immediately by returning this message to the sender and delete all
copies. Thank you for your cooperation.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general