Re: [Linux-HA] pacemaker+drbd promotion delay

Andrew Beekhof Thu, 12 Apr 2012 18:48:02 -0700

On Thu, Apr 12, 2012 at 5:26 PM, Lars Ellenberg
<[email protected]> wrote:
> On Wed, Apr 11, 2012 at 08:22:59AM +1000, Andrew Beekhof wrote:
>> It looks like the drbd RA is calling crm_master during the monitor action.
>> That wouldn't seem like a good idea as the value isn't counted until
>> the resource is started and if the transition is interrupted (as it is
>> here) then the PE won't try to promote it (because the value didn't
>> change).
>
> I did not get the last part.
> Why would it not be promoted,
> even though it has positive master score?


Because we don't know that we need to run the PE again - because the
only changes in the PE were things we expected.

See:
  
https://github.com/beekhof/pacemaker/commit/65f1a22a4b66581159d8b747dbd49fa5e2ef34e1

This "only" becomes and issue when the transition is interrupted
between the non-recurring monitor and the start, which I guess was
rare enough that we hadn't noticed it for 4 years :-(

>
>> Has the drbd RA always done this?
>
> Yes.
>
> When else should we call crm_master?

I guess the only situation you shouldn't is during a non-recurring
monitor if you're about to return 7.
Which I'll concede isn't exactly obvious.

>
> Preference changes: we may lose a local disk,
> we may have been outdated or inconsistent,
> then sync up, etc.
>
>> On Sat, Mar 31, 2012 at 2:56 AM, William Seligman
>> <[email protected]> wrote:
>> > On 3/30/12 1:13 AM, Andrew Beekhof wrote:
>> >> On Fri, Mar 30, 2012 at 2:57 AM, William Seligman
>> >> <[email protected]> wrote:
>> >>> On 3/29/12 3:19 AM, Andrew Beekhof wrote:
>> >>>> On Wed, Mar 28, 2012 at 9:12 AM, William Seligman
>> >>>> <[email protected]> wrote:
>> >>>>> The basics: Dual-primary cman+pacemaker+drbd cluster running on 
>> >>>>> RHEL6.2; spec
>> >>>>> files and versions below.
>> >>>>>
>> >>>>> Problem: If I restart both nodes at the same time, or even just start 
>> >>>>> pacemaker
>> >>>>> on both nodes at the same time, the drbd ms resource starts, but both 
>> >>>>> nodes stay
>> >>>>> in slave mode. They'll both stay in slave mode until one of the 
>> >>>>> following occurs:
>> >>>>>
>> >>>>> - I manually type "crm resource cleanup <ms-resource-name>"
>> >>>>>
>> >>>>> - 15 minutes elapse. Then the "PEngine Recheck Timer" is fired, and 
>> >>>>> the ms
>> >>>>> resources are promoted.
>> >>>>>
>> >>>>> The key resource definitions:
>> >>>>>
>> >>>>> primitive AdminDrbd ocf:linbit:drbd \
>> >>>>> � � � �params drbd_resource="admin" \
>> >>>>> � � � �op monitor interval="59s" role="Master" timeout="30s" \
>> >>>>> � � � �op monitor interval="60s" role="Slave" timeout="30s" \
>> >>>>> � � � �op stop interval="0" timeout="100" \
>> >>>>> � � � �op start interval="0" timeout="240" \
>> >>>>> � � � �meta target-role="Master"
>> >>>>> ms AdminClone AdminDrbd \
>> >>>>> � � � �meta master-max="2" master-node-max="1" clone-max="2" \
>> >>>>> � � � �clone-node-max="1" notify="true" interleave="true"
>> >>>>> # The lengthy definition of "FilesystemGroup" is in the crm pastebin 
>> >>>>> below
>> >>>>> clone FilesystemClone FilesystemGroup \
>> >>>>> � � � �meta interleave="true" target-role="Started"
>> >>>>> colocation Filesystem_With_Admin inf: FilesystemClone AdminClone:Master
>> >>>>> order Admin_Before_Filesystem inf: AdminClone:promote 
>> >>>>> FilesystemClone:start
>> >>>>>
>> >>>>> Note that I stuck in "target-role" options to try to solve the 
>> >>>>> problem; no effect.
>> >>>>>
>> >>>>> When I look in /var/log/messages, I see no error messages or 
>> >>>>> indications why the
>> >>>>> promotion should be delayed. The 'admin' drbd resource is reported as 
>> >>>>> UpToDate
>> >>>>> on both nodes. There are no error messages when I force the issue with:
>> >>>>>
>> >>>>> crm resource cleanup AdminClone
>> >>>>>
>> >>>>> It's as if pacemaker, at start, needs some kind of "kick" after the 
>> >>>>> drbd
>> >>>>> resource is ready to be promoted.
>> >>>>>
>> >>>>> This is not just an abstract case for me. At my site, it's not 
>> >>>>> uncommon for
>> >>>>> there to be lengthy power outages that will bring down the cluster. 
>> >>>>> Both systems
>> >>>>> will come up when power is restored, and I need for cluster services 
>> >>>>> to be
>> >>>>> available shortly afterward, not 15 minutes later.
>> >>>>>
>> >>>>> Any ideas?
>> >>>>
>> >>>> Not without any logs
>> >>>
>> >>> Sure! Here's an extract from the log: <http://pastebin.com/L1ZnsQ0R>
>> >>>
>> >>> Before you click on the link (it's a big wall of text),
>> >>
>> >> I'm used to trawling the logs.  Grep is a wonderful thing :-)
>> >>
>> >> At this stage it is apparent that I need to see
>> >> /var/lib/pengine/pe-input-4.bz2 from hypatia-corosync.
>> >> Do you have this file still?
>> >
>> > No, so I re-ran the test. Here's the log extract from the test I did today
>> > <http://pastebin.com/6QYH2jkf>.
>> >
>> > Based on what you asked for from the previous extract, I think what you 
>> > want
>> > from this test is pe-input-5. Just to play it safe, I copied and 
>> > bunzip2'ed all
>> > three pe-input files mentioned in the log messages:
>> >
>> > pe-input-4: <http://pastebin.com/Txx50BJp>
>> > pe-input-5: <http://pastebin.com/zzppL6DF>
>> > pe-input-6: <http://pastebin.com/1dRgURK5>
>> >
>> > I pray to the gods of Grep that you find a clue in all of that!
>> >
>> >>> here are what I think
>> >>> are the landmarks:
>> >>>
>> >>> - The extract starts just after the node boots, at the start of syslog 
>> >>> at time
>> >>> 10:49:21.
>> >>> - I've highlighted when pacemakerd starts, at 10:49:46.
>> >>> - I've highlighted when drbd reports that the 'admin' resource is 
>> >>> UpToDate, at
>> >>> 10:50:10.
>> >>> - One last highlight: When pacemaker finally promotes the drbd resource 
>> >>> to
>> >>> Primary on both nodes, at 11:05:11.
>> >>>
>> >>>> Details:
>> >>>>>
>> >>>>> # rpm -q kernel cman pacemaker drbd
>> >>>>> kernel-2.6.32-220.4.1.el6.x86_64
>> >>>>> cman-3.0.12.1-23.el6.x86_64
>> >>>>> pacemaker-1.1.6-3.el6.x86_64
>> >>>>> drbd-8.4.1-1.el6.x86_64
>> >>>>>
>> >>>>> Output of crm_mon after two-node reboot or pacemaker restart:
>> >>>>> <http://pastebin.com/jzrpCk3i>
>> >>>>> cluster.conf: <http://pastebin.com/sJw4KBws>
>> >>>>> "crm configure show": <http://pastebin.com/MgYCQ2JH>
>> >>>>> "drbdadm dump all": <http://pastebin.com/NrY6bskk>
>> >
>> > --
>> > Bill Seligman             | Phone: (914) 591-2823
>> > Nevis Labs, Columbia Univ | mailto://[email protected]
>> > PO Box 137                |
>> > Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/
>> >
>> >
>> > _______________________________________________
>> > Linux-HA mailing list
>> > [email protected]
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> > See also: http://linux-ha.org/ReportingProblems
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] pacemaker+drbd promotion delay

Reply via email to