On Nov 6, 2007, at 4:01 PM, Dominik Klein wrote:
Hi
a week earlier I asked wether there was a resource agent that
implements Master/Slave for a Postgres Cluster using slony-1
replication.
There was not, so I tried to implement it myself.
I want to report back to give an explanation and reference on why I
think it is not possible (at the moment) to implement this in
heartbeat.
Here we go:
Short summary of slony-1 replication:
In a slony-1 replication setup
* Tables are put together to replication "sets"
* Each set has an "origin" (master)
* Only the origin can be written to
* There can be multiple sets with a different origin each
* There can be multiple "subscribers" (slaves) for each set
* Subscribers are read-only
As you have to somewhat connect the master role to the health of
postgres itself, this restricts you to the use of only one set or
manage all sets at once. Well, okay, I think I could live with this.
Slony-1 implements two commands for "switchover" and "failover". I
mean Switchover when I want to do a planned switch of roles when all
machines are healthy. I mean failover when the Master has a problem
and the Slave takes over.
So now comes the tricky part.
In slony-1 you cannot make an origin a subscriber without making
another subscriber the new origin. This happens in ONE command. So
there are no independent "demote" and "promote" commands. In a two
machine setup you cannot have two slaves at a time.
In other words: "Promote" implicitely demotes the other machine,
"Demote" implicitely promotes the other machine.
So I thought I could implement "demote" as "return 0", as "promote"
on the other machine will do the job anyway. Well, not the best idea
as a "monitor" action on the apparently demoted machine will still
return Master Status until "promote" on the second machine finished.
What if the crm delayed the slave's monitor until after the other side
was promoted... would that help significantly?
Furthermore, the switchover command will fail if the other machine
is not responding. In case the current master really has a problem,
all you can do get a writeable database on the current slave is to
use the failover command. But Linux-HA only knows "promote" and
"demote".
So I implemented some promote and demote the following way:
#### promote
if switchover_to_me
then
return 0
else
if ! switchover_to_me
then
failover_to_me
return $?
fi
fi
####
#### demote
switchover_to_other_machine
# dont care if this works as it cannot work if
# the other machine is not healthy
return 0
####
What you also need to know about slony-1 is the fact that you need
to resync the COMPLETE data after a failover. In slony-1 it is not
possible to let a failed node rejoin the slony-Cluster (even if it
was healthy when the failover command was issued). It has to fetch
ALL data from the new master. So you want to avoid failover if it is
not absolutely necessary.
Up to now I thought my RA could handle a few cases and it turns out:
SOME it can handle (like master reboot or slave reboot or controlled
switchover). But a simple thing as killing postgres on the master
machine causes a failover. Why?:
Say A is master, B is slave at this moment
1. monitor on A fails
2. Linux-HA executes demote on A
-> As you see above, this will work even if it does nothing
3. Linux-HA executes promote on B
-> This, as postgres on A is not running, will end up in a failover
(see above)
Notifications might help.
The Filesystem agent (when operating in OCFS2 mode) keeps a list of
who its peers are.
If you did the same then I think you'd be able to recognize that
you're all alone and that it was ok to switchover_to_me instead.
That is, if I understood what you're saying correctly.
This is pretty much it. If you have any ideas on how to improve this
or if you also think that this is impossible with the current master/
slave implementation in Linux-HA - please respond.
The whole "separately demote and promote" approach in Linux-HA seems
to just not fit the way slony-1 handles switchover and failover.
If you have any more questions (it can well be I forgot something),
just ask - I'll be happy to help improve Linux-HA.
Best regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems