Hi,
On Tue, Oct 08, 2013 at 01:52:56PM +0200, Ulrich Windl wrote:
> Hi!
>
> I thought, I'll never be bitten by this bug, but I actually was! Now I'm
> wondering whether the Xen RA sees the guest if you use pygrub, and pygrub is
> still counting down for actual boot...
>
> But the reason why I'm writing is that I think I've discovered another bug in
> the RA:
>
> CRM decided to "recover" the guest VM "v02":
> [...]
> lrmd: [14903]: info: operation monitor[28] on prm_xen_v02 for client 14906:
> pid 19516 exited with return code 7
> [...]
> pengine: [14905]: notice: LogActions: Recover prm_xen_v02 (Started h05)
> [...]
> crmd: [14906]: info: te_rsc_command: Initiating action 5: stop
> prm_xen_v02_stop_0 on h05 (local)
> [...]
> Xen(prm_xen_v02)[19552]: INFO: Xen domain v02 already stopped.
> [...]
> lrmd: [14903]: info: operation stop[31] on prm_xen_v02 for client 14906: pid
> 19552 exited with return code 0
> [...]
> crmd: [14906]: info: te_rsc_command: Initiating action 78: start
> prm_xen_v02_start_0 on h05 (local)
> lrmd: [14903]: info: rsc:prm_xen_v02 start[32] (pid 19686)
> [...]
> lrmd: [14903]: info: RA output: (prm_xen_v02:start:stderr) Error: Domain 'v02'
> already exists with ID '3'
> lrmd: [14903]: info: RA output: (prm_xen_v02:start:stdout) Using config file
> "/etc/xen/vm/v02".
> [...]
> lrmd: [14903]: info: operation start[32] on prm_xen_v02 for client 14906: pid
> 19686 exited with return code 1
> [...]
> crmd: [14906]: info: process_lrm_event: LRM operation prm_xen_v02_start_0
> (call=32, rc=1, cib-update=5271, confirmed=true) unknown error
> crmd: [14906]: WARN: status_from_rc: Action 78 (prm_xen_v02_start_0) on h05
> failed (target: 0 vs. rc: 1): Error
> [...]
>
> As you can clearly see "start" failed, because the guest was found up already!
> IMHO this is a bug in the RA (SLES11 SP2: resource-agents-3.9.4-0.26.84).
Yes, I've seen that. It's basically the same issue, i.e. the
domain being gone for a while and then reappearing.
> I guess the following test is problematic:
> ---
> xm create ${OCF_RESKEY_xmfile} name=$DOMAIN_NAME
> rc=$?
> if [ $rc -ne 0 ]; then
> return $OCF_ERR_GENERIC
> ---
> Here "xm create" probably fails if the guest is already created...
It should fail too. Note that this is a race, but the race is
anyway caused by the strange behaviour of xen. With the recent
fix (or workaround) in the RA, this shouldn't be happening.
Thanks,
Dejan
> Regards,
> Ulrich
>
>
> >>> Dejan Muhamedagic <[email protected]> schrieb am 01.10.2013 um 12:24 in
> Nachricht <[email protected]>:
> > Hi,
> >
> > On Tue, Oct 01, 2013 at 12:13:02PM +0200, Lars Marowsky-Bree wrote:
> >> On 2013-10-01T00:53:15, Tom Parker <[email protected]> wrote:
> >>
> >> > Thanks for paying attention to this issue (not really a bug) as I am
> >> > sure I am not the only one with this issue. For now I have set all my
> >> > VMs to destroy so that the cluster is the only thing managing them but
> >> > this is not super clean as I get failures in my logs that are not really
> >> > failures.
> >>
> >> It is very much a severe bug.
> >>
> >> The Xen RA has gained a workaround for this now, but we're also pushing
> >
> > Take a look here:
> >
> > https://github.com/ClusterLabs/resource-agents/pull/314
> >
> > Thanks,
> >
> > Dejan
> >
> >> the Xen team (where the real problem is) to investigate and fix.
> >>
> >>
> >> Regards,
> >> Lars
> >>
> >> --
> >> Architect Storage/HA
> >> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
>
> > HRB 21284 (AG Nürnberg)
> >> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> >>
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems