So after some investigation I've found out that qemu 2.3.0 is indeed
broken, at least the way CS uses the qemu chardev/socket.
Not sure in which specific version it happened, but it was fixed in
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.
qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338
Also attaching the patch from that commit.
For our own purposes i've included the patch to the qemu-kvm-ev package
(2.3.0) and all is well.
On 2016-10-20 09:59, Linas Žilinskas wrote:
Hi.
We have made an upgrade to 4.9.
Custom build packages with our own patches, which in my mind (i'm the
only one patching those) should not affect the issue i'll describe.
I'm not sure whether we didn't notice it before, or it's actually
related to something in 4.9
Basically our system vm's were unable to be patched via the qemu
socket. The script simply error'ed out with a timeout while trying to
push the data to the socket.
Executing it manually (with cmd line from the logs) resulted the same.
I even tried the old perl variant, which also had same result.
So finally we found out that this issue happens only on our HVs which
run qemu 2.3.0, from the centos 7 special interest virtualization
repo. Other ones that run qemu 1.5, from official repos, can patch the
system vms fine.
So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe
it something else special in our setup. e.g. we're running the HVs
from a preconfigured netboot image (pxe), but all of them, including
those with qemu 1.5, so i have no idea.
Linas Žilinskas
Head of Development
website <http://www.host1plus.com/> facebook
<https://www.facebook.com/Host1Plus> twitter
<https://twitter.com/Host1Plus> linkedin
<https://www.linkedin.com/company/digital-energy-technologies-ltd.>
Host1Plus is a division of Digital Energy Technologies Ltd.
26 York Street, London W1U 6PZ, United Kingdom
Linas Žilinskas
Head of Development
website <http://www.host1plus.com/> facebook
<https://www.facebook.com/Host1Plus> twitter
<https://twitter.com/Host1Plus> linkedin
<https://www.linkedin.com/company/digital-energy-technologies-ltd.>
Host1Plus is a division of Digital Energy Technologies Ltd.
26 York Street, London W1U 6PZ, United Kingdom
>From 4bf1cb03fbc43b0055af60d4ff093d6894aa4338 Mon Sep 17 00:00:00 2001
From: Nils Carlson <pyssl...@ludd.ltu.se>
Date: Sun, 19 Jul 2015 20:39:56 +0000
Subject: [PATCH] qemu-char: Fix missed data on unix socket
Commit 812c1057 introduced HUP detection on unix and tcp sockets prior
to a read in tcp_chr_read. This unfortunately broke CloudStack 4.2
which relied on the old behaviour where data on a socket was readable
even if a HUP was present.
A working solution is to properly check the return values from recv,
handling a closed socket once there is no more data to read.
Also enable polling for G_IO_NVAL to ensure the callback is called
for all possible events as these should now be possible to handle
with the improved error detection.
Signed-off-by: Nils Carlson <pyssl...@ludd.ltu.se>
Message-Id: <1437338396-22336-1-git-send-email-pyssl...@ludd.ltu.se>
[Do not handle EINTR; use socket_error(). - Paolo]
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
qemu-char.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/qemu-char.c b/qemu-char.c
index 3200200..d956f8d 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -807,7 +807,8 @@ static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_)
}
if (now_active) {
- iwp->src = g_io_create_watch(iwp->channel, G_IO_IN | G_IO_ERR | G_IO_HUP);
+ iwp->src = g_io_create_watch(iwp->channel,
+ G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
g_source_set_callback(iwp->src, iwp->fd_read, iwp->opaque, NULL);
g_source_attach(iwp->src, NULL);
} else {
@@ -2856,12 +2857,6 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
uint8_t buf[READ_BUF_LEN];
int len, size;
- if (cond & G_IO_HUP) {
- /* connection closed */
- tcp_chr_disconnect(chr);
- return TRUE;
- }
-
if (!s->connected || s->max_size <= 0) {
return TRUE;
}
@@ -2869,7 +2864,9 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
if (len > s->max_size)
len = s->max_size;
size = tcp_chr_recv(chr, (void *)buf, len);
- if (size == 0) {
+ if (size == 0 ||
+ (size < 0 &&
+ socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
/* connection closed */
tcp_chr_disconnect(chr);
} else if (size > 0) {
--
2.9.2