return codes
[r...@lpissan1001 ~]# service heartbeat stop
Stopping High-Availability services:
[ OK ]
[r...@lpissan1001 ~]# drbdadm up Storage1
/dev/drbd0: Failure: (124) Device is attached to a disk (use detach first)
Command 'drbdsetup /dev/drbd0 disk /dev/sdb /dev/sdb internal
--set-defaults --create-device --on-io-error=pass_on' terminated with exit
code 10
[r...@lpissan1001 ~]# drbdadm down Storage1
[r...@lpissan1001 ~]# drbdadm up Storage1
[r...@lpissan1001 ~]# drbdadm up Storage1
/dev/drbd0: Failure: (124) Device is attached to a disk (use detach first)
Command 'drbdsetup /dev/drbd0 disk /dev/sdb /dev/sdb internal
--set-defaults --create-device --on-io-error=pass_on' terminated with exit
code 10
[r...@lpissan1001 ~]# echo $?
1
[r...@lpissan1001 ~]# drbdadm down Storage1
[r...@lpissan1001 ~]# echo $?
0
[r...@lpissan1001 ~]# cat /proc/drbd
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by
[email protected], 2009-02-12 13:13:30
0: cs:Unconfigured
[r...@lpissan1001 ~]#
Jason
On Feb 12, 2009 12:24pm, Dominik Klein <[email protected]> wrote:
Did you look into the returncodes and eventually tell linbit about it?
That would be a big issue.
Regards
Dominik
[email protected] wrote:
> (Big Cheers and celebrations from this end!!!)
>
> Finally figured out what the problem was, it seems that the kernel oops
> were being caused by the 8.3 version of DRBD, once downgraded to 8.2.7
> everything started to work as it should. primary / secondary automatic
> fail over is in place and resources are now following the DRBD master!
>
> Thanks a mill for all the help.
>
> Jason
>
> On Feb 12, 2009 8:48am, Jason Fitzpatrick [email protected]>
wrote:
>> Hi Dominik
>>
>> thanks again for the feedback,
>>
>> I had noticed some kernel opps's since the last kernel update that i
and
> they seem to be pointing to DRBD, i will downgrade the kernel again and
> see if this improves things,
>>
>>
>> re Stonith I Uninstalled as part of the move from heartbeat v2.1 to 2.9
> but must have missed this bit.
>>
>> user land and kernel module all report the same version.
>>
>> I am on my way into the office now and I will apply the changes once
>> there
>>
>>
>> thanks again
>>
>> Jason
>>
>> 2009/2/12 Dominik Klein [email protected]>
>>
>> Right, this one looks better.
>>
>>
>>
>> I'll refer to nodes as 1001 and 1002.
>>
>>
>>
>> 1002 is your DC.
>>
>> You have stonith enabled, but no stonith devices. Disable stonith or
get
>>
>> and configure a stonith device (_please_ dont use ssh).
>>
>>
>>
>> 1002 ha-log lines 926:939, node 1002 wants to shoot 1001, but cannot (l
>>
>> 978). Retries in l 1018 and fails again in l 1035.
>>
>>
>>
>> Then, the cluster tries to start drbd on 1001 in l 1079, followed by a
>>
>> bunch of kernel messages I don't understand (pretty sure _this_ is the
>>
>> first problem you should address!), ending up in the drbd RA not able
to
>>
>> see the secondary state (1449) and considering the start failed.
>>
>>
>>
>> The RA code for this is
>>
>> if do_drbdadm up $RESOURCE ; then
>>
>> drbd_get_status
>>
>> if [ "$DRBD_STATE_LOCAL" != "Secondary" ]; then
>>
>> ocf_log err "$RESOURCE start: not in Secondary mode after start."
>>
>> return $OCF_ERR_GENERIC
>>
>> fi
>>
>> ocf_log debug "$RESOURCE start: succeeded."
>>
>> return $OCF_SUCCESS
>>
>> else
>>
>> ocf_log err "$RESOURCE: Failed to start up."
>>
>> return $OCF_ERR_GENERIC
>>
>> fi
>>
>>
>>
>> The cluster then successfully stops drbd again (l 1508-1511) and tries
>>
>> to start the other clone instance (l 1523).
>>
>>
>>
>> Log says
>>
>> RA output: (Storage1:1:start:stdout) /dev/drbd0: Failure: (124) Device
>>
>> is attached to a disk (use detach first) Command 'drbdsetup /dev/drbd0
>>
>> disk /dev/sdb /dev/sdb internal --set-defaults --create-device
>>
>> --on-io-error=pass_on' terminated with exit code 10
>>
>>
>> Feb 11 15:39:05 lpissan1002 drbd[3473]: ERROR: Storage1 start: not in
>>
>> Secondary mode after start.
>>
>>
>>
>> So this is interesting. Although "stop" (basically drbdadm down)
>>
>> succeeded, the drbd device is still attached!
>>
>>
>>
>> Please try:
>>
>> stop the cluster
>>
>> drbdadm up $resource
>>
>> drbdadm up $resource #again
>>
>> echo $?
>>
>> drbdadm down $resource
>>
>> echo $?
>>
>> cat /proc/drbd
>>
>>
>>
>> Btw: Does your userland match your kernel module version?
>>
>>
>>
>> To bring this to an end: The start of the second clone instance also
>>
>> failed, so both instances are unrunnable on the node and no further
>>
>> start is tried on 1002.
>>
>>
>>
>> Interestingly, then (could not see any attempt before), the cluster
>>
>> wants to start drbd on node 1001, but it also fails and also gives
those
>>
>> kernel messages. In l 2001, each instance has a failed start on each
>> node.
>>
>>
>>
>> So: Find out about those kernel messages. Can't help much on that
>>
>> unfortunately, but there were some threads about things like that on
>>
>> drbd-user recently. Maybe you can find answers to that problem there.
>>
>>
>>
>> And also: please verify returncodes of drbdadm in your case. Maybe
>>
>> that's a drbd tools bug? (can't say for sure, for me, up on an alreay
up
>>
>> resource gives 1, which is ok).
>>
>>
>>
>> Regards
>>
>> Dominik
>>
>>
>>
>> Jason Fitzpatrick wrote:
>>
>>
>> > it seems that I had the incorrect version of openais installed (from
>> the
>>
>> > fedora repo vs the HA one)
>>
>> >
>>
>> > I have corrected and the hb_report ran correctly using the following
>>
>> >
>>
>> > hb_report -u root -f 3pm /tmp/report
>>
>> >
>>
>> > Please see attached
>>
>> >
>>
>> > Thanks again
>>
>> >
>>
>> > Jason
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Linux-HA mailing list
>>
>> [email protected]
>>
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>
>> See also: http://linux-ha.org/ReportingProblems
>>
>>
>>
>>
>>
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems