Am 24.01.2013 um 10:37 schrieb Lars Ellenberg:

On Thu, Jan 24, 2013 at 10:09:30AM +0100, Helmut Wollmersdorfer wrote:


The error-message comes from drbd/user/drbdadm_parser.c ->
vcheck_uniq(). It checks if certain keys like resource-name in the
config are unique. For some reasons this check seems to be activated
via pacemaker/crm -> ocf:linbit:drbd -> monitor(?) -- as far as I
understand the code.

Right.

pacemaker monitor for some resource (maybe drbd8_1) was executed,

First it was executed for drbd5_1.

Jan 8 15:29:31 xen11 lrmd: [2403]: info: RA output: (xen_drbd5_1:1:monitor:stderr) drbd.d/drbd8_1.res:1: conflicting use of resource section 'drbd8_1' ...#012drbd.d/drbd10_1.res:1: resource section 'drbd8_1' first used here.

from here

primitive xen_drbd5_1 ocf:linbit:drbd \
    params drbd_resource="drbd5_1" \
    op monitor interval="15s" \
    op start interval="0" timeout="240s" \
    op stop interval="0" timeout="100s"

but found two definitions of that resource.
Did not know what to do, was confused, errors out with "generic error".

Pacemaker recovery for "generic error" is -> stop.

It stopped all resources running on node xen11:

Jan 8 15:29:37 xen11 Xen[32084]: INFO: Xen domain www will be stopped (timeout: 26s) Jan 8 15:29:37 xen11 Xen[32089]: INFO: Xen domain mail4 will be stopped (timeout: 26s) Jan 8 15:29:37 xen11 Xen[32086]: INFO: Xen domain typo3 will be stopped (timeout: 26s)

www:    drbd1_1, drbd1_2
mail4:  drbd8_1 [*], drbd8_2
typo3:  drbd2_1, drbd2_2

[*] This is the only "suspicious" one.

In my naive point of view I never would expect this behaviour.

BTW: I remember something like a state-transition graph existing for DRBD-0.7. Would be nice to have something human (and machine) readable to create automated tests, resp. review the current test-coverage.

Helmut Wollmersdorfer
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to