On Mon, Jan 20, 2014 at 5:46 PM, Apollon Oikonomopoulos
<[email protected]>wrote:

> Hi Dimitris,
>
> On 17:47 Mon 20 Jan     , Dimitris Aragiorgis wrote:
> > Hi all!
> >
> >
> > Recently we observed some strange TypeErrors in node-daemon.log related
> to
> > QMP message handling. Specifically the exceptions were inside
> GetInstanceInfo().
> >
> > Please do the following test:
> >
> > 1) Client A connects to qmp socket with socat
> > 2) Client A gets greeting message {"QMP": {"version": ..}
> > 3) Client A waits (select on the socket's fd)
> > 4) Client B tries to connect to the *same* qmp socket with socat
> > 5) Client B does *NOT* get any greating message
> > 6) Client B waits (select on the socket's fd)
> > 7) Client B closes connection (kill socat)
> > 8) Client A quits too
> > 9) Client C connects to qmp socket
> > 10) Client C gets *two* greeting messages!!!
> >
> > This case is reproducable in Ganeti:
> > 1) Allocator runs and invokes GetAllInstancesInfo()
>
> Doesn't the IAllocator run with the nodes locked?
>

It does, and all of them if opportunistic locking is not used, yet instance
info calls do not require these locks, AFAIK.


> > 2) GetInstanceInfo() creates a QMPConnection()
> > 3) Meanwhile a client wants to get instance runtime info
> > 4) GetInstanceInfo() creates a new QMPConnection() but _Recv() blocks
> when
> > trying to read from socket and a timeout exception is raised. This
> > is catched and raised as HypervisorError which is *silently* catched
> inside
> > GetInstanceInfo()
> > 5) First connection closes
> > 6) When a third connection will arive it will get *two* greetings!
> > 7) The first greeting will be consumed inside connect()
> > 8) The second will be consumed as response to qmp_capabilities execute
> command
> > 9) info cpus will get {'return': {}}!! which will result to TypeError.
> >
> > After digging qemu source we found that monitor's output buffer is
> > shared between clients. Only one client can interact with the monitor.
> > The others are considered as blocked. Still their greeting message is
> > written to the output buffer and a callback is registered to flush the
> > message to the socket when possible. This means that if a client closes
> > the connection before getting the greeting, these data remain in the
> > buffer and will be flushed upon a future connection.
>
> This sounds like a qemu bug, and as such it should IMHO be discussed
> with the qemu upstream first. Is it also present in qemu-1.6? Earlier
> versions?
>

It should definitely be discussed with the qemu upstream, yet if this is
reproducible, with any version of qemu, Ganeti should provide a workaround.
Chances are that many users will be stuck with a certain qemu version for a
while, and we don't want getting bug reports regarding this a year or two
from now.


> >
> > To fix this on the Ganeti side, first of all, a finally: section that
> closes
> > the socket must be added for each QMPConnection(), that practically
> leaves
> > smaller timeslot for more than one concurrent connections. Additionally
> we
> > should parse as many greetings available when starting the connection,
> > and finally parse the output of qmp_capabilities command.
>
> On top of that you would need a per-monitor lock to be safe, but even if
> you do all of this, you will not be safe from external tools trying to
> use the QMP socket (including the KVM daemon IIUC?).
>

This is a valid point, yet I believe that's what the socket closing code
was meant to try and mitigate.
Short of a qemu fix, I do not believe that we can be safe - just to try and
warn the user of the unfortunate interactions when more greeting messages
arrive.


>
> Regards,
> Apollon
>

Reply via email to