On 05/01/2012 01:10 PM, Laine Stump wrote: > This patch is one alternative to solve the problem detailed in: > > https://bugzilla.redhat.com/show_bug.cgi?id=816465 > > Some other unidentified library in use by libvirtd (in another thread) > is apparently temporarily binding to a NETLINK_ROUTE raw socket with > an address of "pid of libvirtd" during startup. This is the same > address used by libnl for the first netlink socket it binds, and the > netlink socket allocated for virNetlinkEventServiceStart() happens to > be that first socket; the result is that nl_connect() fails about > 15-20% of the time (but apparently only if there is a guest running at > the time libvirtd starts). > > Testing has shown that in the case that nl_connect fails the first > time, retrying it after a 500msec sleep leads to success 100% of the > time, so this patch doubles that delay (which also has 100% success > rate. >
> +++ b/src/util/virnetlink.c
> @@ -355,9 +355,18 @@ virNetlinkEventServiceStart(void)
> }
>
> if (nl_connect(srv->netlinknh, NETLINK_ROUTE) < 0) {
> - virReportSystemError(errno,
> - "%s", _("cannot connect to netlink socket"));
> - goto error_server;
> + /* the address that libnl wants to use for this connect ("pid
> + * of libvirtd") is sometimes temporarily in use by some other
> + * unidentified code. Retrying after a 500msec sleep has
> + * achieved 100% success rates, so we sleep for 1000msec and
> + * retry.
> + */
> + usleep(1000000);
Sleeping for 1 entire second is user-visible; if we go with this
approach, I'd rather see it be as a retry loop that probes something
like once every 200ms for 5 tries (or something similar), for better
response time.
--
Eric Blake [email protected] +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/libvir-list
