Re: [Linux-HA] pacemaker+drbd promotion delay

Andrew Beekhof Thu, 29 Mar 2012 00:20:00 -0700

On Wed, Mar 28, 2012 at 9:12 AM, William Seligman
<[email protected]> wrote:
> The basics: Dual-primary cman+pacemaker+drbd cluster running on RHEL6.2; spec
> files and versions below.
>
> Problem: If I restart both nodes at the same time, or even just start 
> pacemaker
> on both nodes at the same time, the drbd ms resource starts, but both nodes 
> stay
> in slave mode. They'll both stay in slave mode until one of the following 
> occurs:
>
> - I manually type "crm resource cleanup <ms-resource-name>"
>
> - 15 minutes elapse. Then the "PEngine Recheck Timer" is fired, and the ms
> resources are promoted.
>
> The key resource definitions:
>
> primitive AdminDrbd ocf:linbit:drbd \
>        params drbd_resource="admin" \
>        op monitor interval="59s" role="Master" timeout="30s" \
>        op monitor interval="60s" role="Slave" timeout="30s" \
>        op stop interval="0" timeout="100" \
>        op start interval="0" timeout="240" \
>        meta target-role="Master"
> ms AdminClone AdminDrbd \
>        meta master-max="2" master-node-max="1" clone-max="2" \
>        clone-node-max="1" notify="true" interleave="true"
> # The lengthy definition of "FilesystemGroup" is in the crm pastebin below
> clone FilesystemClone FilesystemGroup \
>        meta interleave="true" target-role="Started"
> colocation Filesystem_With_Admin inf: FilesystemClone AdminClone:Master
> order Admin_Before_Filesystem inf: AdminClone:promote FilesystemClone:start
>
> Note that I stuck in "target-role" options to try to solve the problem; no 
> effect.
>
> When I look in /var/log/messages, I see no error messages or indications why 
> the
> promotion should be delayed. The 'admin' drbd resource is reported as UpToDate
> on both nodes. There are no error messages when I force the issue with:
>
> crm resource cleanup AdminClone
>
> It's as if pacemaker, at start, needs some kind of "kick" after the drbd
> resource is ready to be promoted.
>
> This is not just an abstract case for me. At my site, it's not uncommon for
> there to be lengthy power outages that will bring down the cluster. Both 
> systems
> will come up when power is restored, and I need for cluster services to be
> available shortly afterward, not 15 minutes later.
>
> Any ideas?


Not without any logs

>
> Details:
>
> # rpm -q kernel cman pacemaker drbd
> kernel-2.6.32-220.4.1.el6.x86_64
> cman-3.0.12.1-23.el6.x86_64
> pacemaker-1.1.6-3.el6.x86_64
> drbd-8.4.1-1.el6.x86_64
>
> Output of crm_mon after two-node reboot or pacemaker restart:
> <http://pastebin.com/jzrpCk3i>
> cluster.conf: <http://pastebin.com/sJw4KBws>
> "crm configure show": <http://pastebin.com/MgYCQ2JH>
> "drbdadm dump all": <http://pastebin.com/NrY6bskk>
> --
> Bill Seligman             | Phone: (914) 591-2823
> Nevis Labs, Columbia Univ | mailto://[email protected]
> PO Box 137                |
> Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] pacemaker+drbd promotion delay

Reply via email to