I don't think the issue is the same. As i mentioned in the original report and my findings afterwards, this is a specifically qemu issue which was fixed in 2.4.0-rc3.

The issue was the way qemu exposes the socket to communicate with VM. It didn't queue data, so unless the VM was listening on /dev/vport.. at the time when data is sent, it would never reiceive it. 2.4.0-rc3 fixed this by queueing the data sent, so once sent, it was accessible (only once) when the VM checked /dev/vport..


On 19/12/16 10:37, Wei ZHOU wrote:
Hi Linas,

It seems the issue you mentioned has been fixed by the commits for https://issues.apache.org/jira/browse/CLOUDSTACK-2823

CloudStack-agent will try to pass the boot args 30 times if the console Ip is not accessible.

Weird.

-Wei

2016-12-19 10:03 GMT+01:00 Linas Žilinskas <[email protected] <mailto:[email protected]>>:

    From the logs it doesn't seem that the script timeouts. "Execution
    is successful", so it manages to pass the data over the socket.

    I guess the systemvm just doesn't configure itself for some reason.

    Also, in my personal tests, I noticed some different behaviour
    with different kernels. Don't remember the specifics right now,
    but on some combinations (qemu / kernel) the socket acted
    differently. For example the data was sent over the socket, but
    wasn't visible inside the VM. Other times the socket would be
    stuck from the host side.

    So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x)
    or try to login to the system vm and see what's happening from inside.


    On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:
    On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
    On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

    So after some investigation I've found out that qemu 2.3.0 is
    indeed broken, at least the way CS uses the qemu chardev/socket.

    Not sure in which specific version it happened, but it was
    fixed in 2.4.0-rc3, specifically noting that CloudStack 4.2 was
    not working.

    qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

    Also attaching the patch from that commit.


    For our own purposes i've included the patch to the qemu-kvm-ev
    package (2.3.0) and all is well.

    Hi,

    I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
    latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
    package.

    The issue initially surfaced following a heartbeat-induced reset of
    all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
    qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py
    <http://patchviasocket.pl/py> timeouts
    persisted for 1 out of 4 router VM/networks, even after
    upgrading to
    latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
    and the patched code are pretty much still intact, as per the
    2.4.0-rc3 commit).

    Any help would be greatly appreciated.

    Thanks.

    (Attached are some debug logs from the host's agent.log)

    Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ

    Thanks.


    --sazli



    On 2016-10-20 09:59, Linas ?ilinskas wrote:

     Hi.

     We have made an upgrade to 4.9.

     Custom build packages with our own patches, which in my mind
    (i'm the only
     one patching those) should not affect the issue i'll describe.

     I'm not sure whether we didn't notice it before, or it's
    actually related
     to something in 4.9

     Basically our system vm's were unable to be patched via the
    qemu socket.
     The script simply error'ed out with a timeout while trying to
    push the
     data to the socket.

     Executing it manually (with cmd line from the logs) resulted
    the same. I
     even tried the old perl variant, which also had same result.

     So finally we found out that this issue happens only on our
    HVs which run
     qemu 2.3.0, from the centos 7 special interest virtualization
    repo. Other
     ones that run qemu 1.5, from official repos, can patch the
    system vms
     fine.

     So i'm wondering if anyone tested 4.9 with kvm with qemu >=
    2.x? Maybe it
     something else special in our setup. e.g. we're running the
    HVs from a
     preconfigured netboot image (pxe), but all of them, including
    those with
     qemu 1.5, so i have no idea.


     Linas ?ilinskas
     Head of Development
     website <http://www.host1plus.com/>
    <http://www.host1plus.com/> facebook
    <https://www.facebook.com/Host1Plus>
    <https://www.facebook.com/Host1Plus> twitter
    <https://twitter.com/Host1Plus>
    <https://twitter.com/Host1Plus> linkedin
    <https://www.linkedin.com/company/digital-energy-technologies-ltd.>
    <https://www.linkedin.com/company/digital-energy-technologies-ltd.>


     Host1Plus is a division of Digital Energy Technologies Ltd.

     26 York Street, London W1U 6PZ, United Kingdom


    Linas ?ilinskas
    Head of Development
    website <http://www.host1plus.com/> <http://www.host1plus.com/>
    facebook <https://www.facebook.com/Host1Plus>
    <https://www.facebook.com/Host1Plus> twitter
    <https://twitter.com/Host1Plus> <https://twitter.com/Host1Plus>
    linkedin
    <https://www.linkedin.com/company/digital-energy-technologies-ltd.>
    <https://www.linkedin.com/company/digital-energy-technologies-ltd.>

    Host1Plus is a division of Digital Energy Technologies Ltd.

    26 York Street, London W1U 6PZ, United Kingdom





    Linas Žilinskas
    Head of Development
    website <http://www.host1plus.com/> facebook
    <https://www.facebook.com/Host1Plus> twitter
    <https://twitter.com/Host1Plus> linkedin
    <https://www.linkedin.com/company/digital-energy-technologies-ltd.>

    Host1Plus is a division of Digital Energy Technologies Ltd.

    26 York Street, London W1U 6PZ, United Kingdom



Linas Žilinskas
Head of Development
website <http://www.host1plus.com/> facebook <https://www.facebook.com/Host1Plus> twitter <https://twitter.com/Host1Plus> linkedin <https://www.linkedin.com/company/digital-energy-technologies-ltd.>

Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom

Reply via email to