Slony Cluster?

Andrew Beekhof Wed, 07 Nov 2007 00:39:13 -0800


On Nov 7, 2007, at 8:26 AM, Dominik Klein wrote:

Hi Andrew

thanks for your reply.
So I thought I could implement "demote" as "return 0", as"promote" on the other machine will do the job anyway. Well, notthe best idea as a "monitor" action on the apparently demotedmachine will still return Master Status until "promote" on thesecond machine finished.
What if the crm delayed the slave's monitor until after the otherside was promoted... would that help significantly?
That would propably prevent one failed monitor action in this veryspecial case.

Feel free to add it as an enhancement request in bugzilla - i think itmakes a certain amount of sense (in general, not just in your case) toimplement

Furthermore, the switchover command will fail if the other machineis not responding. In case the current master really has aproblem, all you can do get a writeable database on the currentslave is to use the failover command. But Linux-HA only knows"promote" and "demote".
So I implemented some promote and demote the following way:

#### promote
if switchover_to_me
then
   return 0
else
   if ! switchover_to_me
   then
       failover_to_me
       return $?
   fi
fi
####

#### demote
switchover_to_other_machine
# dont care if this works as it cannot work if
# the other machine is not healthy
return 0
####
What you also need to know about slony-1 is the fact that you needto resync the COMPLETE data after a failover. In slony-1 it is notpossible to let a failed node rejoin the slony-Cluster (even if itwas healthy when the failover command was issued). It has to fetchALL data from the new master. So you want to avoid failover if itis not absolutely necessary.
Up to now I thought my RA could handle a few cases and it turnsout: SOME it can handle (like master reboot or slave reboot orcontrolled switchover). But a simple thing as killing postgres onthe master machine causes a failover. Why?:
Say A is master, B is slave at this moment

1. monitor on A fails
2. Linux-HA executes demote on A
-> As you see above, this will work even if it does nothing
3. Linux-HA executes promote on B
-> This, as postgres on A is not running, will end up in afailover (see above)
Notifications might help.
The Filesystem agent (when operating in OCFS2 mode) keeps a list ofwho its peers are.If you did the same then I think you'd be able to recognize thatyou're all alone and that it was ok to switchover_to_me instead.
Read my first post again.

I read it a few times but found myself getting lost - maybe i was justhaving a bad day :-)

Switchover is not possible if the other postgres instance is notavailable. The only way to make a single slave the new master is touse the failover command.
What *would* help here is:

1. monitor on A fails -> OCF_NOT_RUNNING
Now, instead of "demote A, promote B":
2. Stop/Start the resource on A


One can certainly have demote->stop->start->promote (all on A)

Does that help? The demote shouldn't be a problem for you because itsa no-op.

The notify data will also tell you that you used to be a master andthat no-one else is one at the moment... in theory that should beenough info to allow you to do something clever right?

Iirc "start" includes a monitor action (or "probe" called sometimesin this case).

Not really... we probe only when we dont know the current state of aresource on a node.So generally this only happens at startup or when the admin usescrm_resource -C

This would report "OCF_RUNNING_MASTER", so the problem would besolved.
On the other hand, this is propably a pretty big change in Linux-HA's master/slave handling and this should be discussed.
Regards
Dominik
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Feedback: Master/Slave RA for Postgres / Slony Cluster?

Reply via email to