I completely shutdown heartbeat on both nodes, cleared out the backup
cib.xml files, recopied the cib.xml from the primary node to the
secondary, then brought everything back up. This cleared the "diff"
error. The drbd master/slave pair came up as expected, but when I tried
to stop them, they eventually went into an unmanaged state. Looking at
the logs and comparing to the stop function in the OCF script, I noticed
that I was seeing a successful "drbdadm down", but the additional check
for status after the down was indicating that the down was unsuccessful
(from checking drbdadm state). Further, I manually verified that indeed
the drbd processes were down, and executed the following:

[EMAIL PROTECTED] xml]# /sbin/drbdadm -c /etc/drbd.conf state pgsql
Secondary/Unknown
[EMAIL PROTECTED] xml]# cat /proc/drbd
version: 8.0.1 (api:86/proto:86)
SVN Revision: 2784 build by [EMAIL PROTECTED], 2007-04-09 11:30:31
 0: cs:Unconfigured

Its the same output on either node, and drbd is definitely down on both
nodes. So, /proc/drbd correctly indicates drbd is down, but the
subsequent check using drbdadm state comes back indicating one side is
up in Secondary mode, which its not. This is why the resource is now in
unmanaged mode. Any ideas why the two tools would differ?

Doug

On Fri, 2007-04-20 at 11:35 -0400, Doug Knight wrote:

> In the interim I set the filesystem group to unmanaged to test failing
> the drbd master/slave processes back and forth, using the the value part
> of the place constraint. On my first attempt to switch nodes, it
> basically took both drbd processes down, and they stayed down. When I
> checked the logs on the node to which I was switching the primary drbd I
> found a message about a failed application diff. I switched the place
> constraint back to the original node. I decided to shutdown heartbeat on
> the node where I was seeing the diff error, now the shutdown is hung and
> the diff error below is repeating every minute:
> 
> cib[3040]: 2007/04/20_11:24:52 WARN: cib_process_diff: Diff 0.11.587 ->
> 0.11.588 not applied to 0.11.593: current "num_updates" is greater than
> required
> cib[3040]: 2007/04/20_11:24:52 WARN: do_cib_notify: cib_apply_diff of
> <diff > FAILED: Application of an update diff failed
> cib[3040]: 2007/04/20_11:24:52 WARN: cib_process_request: cib_apply_diff
> operation failed: Application of an update diff failed
> cib[3040]: 2007/04/20_11:24:52 WARN: cib_process_diff: Diff 0.11.588 ->
> 0.11.589 not applied to 0.11.593: current "num_updates" is greater than
> required
> cib[3040]: 2007/04/20_11:24:52 WARN: do_cib_notify: cib_apply_diff of
> <diff > FAILED: Application of an update diff failed
> cib[3040]: 2007/04/20_11:24:52 WARN: cib_process_request: cib_apply_diff
> operation failed: Application of an update diff failed
> 
> 
> I (and my boss) are kind of getting frustrated getting this setup to
> work. Is there something obvious I'm missing? Has anyone ever had HA
> 2.0.8, using v2 monitoring and drbd ocf script, and drbd version 8.0.1
> working in a two node cluster? I'm concerned because of the comment made
> earlier by Bernhard.
> 
> Doug
> 
> On Fri, 2007-04-20 at 10:55 -0400, Doug Knight wrote:
> 
> > I changed the constraints to point to the master_slave ID, and voila,
> > even without the Filesystem resource running, the drbd resource
> > recognized the place constraint and the GUI now indicates master running
> > wher I expected it to. One down, one to go. Now, just to be sure, here's
> > the modified group XML with the notify nvpair added:
> > 
> > <group ordered="true" collocated="true" id="grp_pgsql_mirror">
> >    <primitive class="ocf" type="Filesystem" provider="heartbeat"
> > id="fs_mirror">
> >      <instance_attributes id="fs_mirror_instance_attrs">
> >        <attributes>
> >          <nvpair id="fs_mirror_device" name="device"
> > value="/dev/drbd0"/>
> >          <nvpair id="fs_mirror_directory" name="directory"
> > value="/mirror"/>
> >          <nvpair id="fs_mirror_fstype" name="fstype" value="ext3"/>
> >          <nvpair id="fs_notify" name="notify" value="true"/>
> >        </attributes>
> >      </instance_attributes>
> >    </primitive>
> >    <instance_attributes id="grp_pgsql_mirror_instance_attrs">
> >      <attributes/>
> >    </instance_attributes>
> >  </group>
> > 
> > I wanted to confirm I put it in the right place, as there was an
> > instance_attributes tag for both the primitive resource within the
> > group, and for the group itself. I put it in the resource tag, per your
> > statement below, is that correct?
> > 
> > Doug
> > 
> > On Fri, 2007-04-20 at 16:06 +0200, Andrew Beekhof wrote:
> > 
> > > On 4/20/07, Knight, Doug <[EMAIL PROTECTED]> wrote:
> > > > OK, here's what happened. The drbd resources were both successfully
> > > > running in Secondary mode on both servers, and both partitions were
> > > > synched. My Filesystem resource was stopped, with the colocation, order,
> > > > and place constraints in place. When I started the Filesystem resource,
> > > > which is part of a group, it triggered the appropriate drbd slave to
> > > > promote to master and transition to Primary. However, The Filesystem
> > > > resource did not complete or mount the partition, which I believe is
> > > > because Notify is not enabled on it. A manual cleanup finally got it to
> > > > start and mount, following all of the constraints I had defined. Next, I
> > > > tried putting the server which was drbd primary into Standby state,
> > > > which caused all kinds of problems (hung process, hung GUI, heartbeat
> > > > shutdown wouldn't complete, etc). I finally had to restart heartbeat on
> > > > the server I was trying to send into Standby state (note that this node
> > > > was also the DC at the time). So, I'm back up to where I have drbd in
> > > > slave/slave, secondary/secondary mode, and filesystem stopped.
> > > >
> > > > I wanted to add notify="true" to either the filesystem resource itself
> > > > or to its group, but the DTD does not define notify for groups (even
> > > > though for some reason the GUI thinks you CAN define the notify
> > > > attribute). I plan on eventually adding an IPaddr and a pgsql resource
> > > > to this group. So I have two questions: 1) Where does it make more sense
> > > > to add notify, at the group level or for the individual resource; and 2)
> > > > Should the DTD define notify as an attribute of groups?
> > > 
> > > add it as a resource attribute
> > > 
> > >      <group ...>
> > >         <instance_attributes id="...">
> > >           <attributes>
> > >             <nvpair id="..." name="notify" value="true"/>
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > > 
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to