Hi My cluster consists of two nodes which serves bunch of kvm virtualized guests via VirtualDomain resource agent.
I have random problem with stopping VirtualDomains. The same domain can
stop without errors 10 times and another stop generate error which
leads to stonith.
Errors are always the same: cannot send monitor command
'{"execute":"query-balloon"}': Connection reset by peer
Nevertheless the domain is stopped.
I saw the same problem on Scientific Linux 6.0 and Debian Squeeze hosts
Disabling ballooning device in domain libvirt xml file solves it, but It
is just workaround.
Logs:
Jul 01 12:27:45 bolek lrmd: [1882]: info: cancel_op: operation monitor[114] on
ocf::VirtualDomain::vr_debian1 for client 1885, its parameters:
CRM_meta_interval=[60000] CRM_meta_depth=[0]
config=[/etc/libvirt/qemu/debian1.xml] depth=[0] crm_feature_set=[3.0.2]
CRM_meta_name=[monitor] CRM_meta_start_delay=[10000] CRM_meta_timeout=[60000]
migration_transport=[ssh] cancelled
Jul 01 12:27:55 bolek lrmd: [1882]: info: rsc:vr_debian1:136: stop
Jul 01 12:27:55 bolek lrmd: [1882]: info: RA output: (vr_debian1:stop:stdout)
Domain debian1 is being shutdown
Jul 01 12:27:58 bolek lrmd: [1882]: info: RA output: (vr_debian1:stop:stderr)
error: cannot send monitor command '{"execute":"query-balloon"}': Connection
reset by peer
Jul 01 12:27:58 bolek lrmd: [1882]: info: RA output: (vr_debian1:stop:stderr)
error: Failed to destroy domain debian1
Best regards
--
Pawel Warowny
signature.asc
Description: PGP signature
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
