Hi Marcus, Simon,

I explore two of the short term solutions and I've a working (work in progress) 
script that replaces the patchviasocket script to use the qemu guest agent 
(that is installed in 4.11+ sytemvmtemplate). This was part of a scoping 
exercise for solving the patching problem for qemu 2.12+ (Ubuntu 19.04 has 3.x 
version).


This is what I've so far, however, further testing is needed:

https://gist.github.com/rhtyd/ddb42c4c7581c4129ca04fbb829f16cf


The logic is completely written in bash as:

- Try if we're able to contact the guest agent

- Once we're able to connect, confirm that the I/O is not error prone

- Then write the payload as file (the ssh public key and cmdline string)

- Then fix file permissions
- Hope that internally cloud-early-config would detect the cmdline we had saved 
and patching would work


While this may work, for the long term a proper fix is needed that should be a 
standard patching mechanism across all hypervisors.


Regards,

Rohit Yadav

Software Architect, ShapeBlue

https://www.shapeblue.com

________________________________
From: Marcus <shadow...@gmail.com>
Sent: Friday, April 12, 2019 11:30:46 PM
To: dev@cloudstack.apache.org
Subject: Re: Latest Qemu KVM EV appears to be broken with ACS

Long ago it was a disk. The problem was that these disks had to go
somewhere, a place where they could survive migrations, which didn't work
well for block based primary storage... at least for the code base at the
time. Using virtio socket was seen as a fairly standard way to communicate
temporary information to the guest, and didn't require managing the
lifecycle of a special disk.

I believe the current problem is that the sender needs to remain connected
until the receiver has read. Maybe socat does this, but if so we need to
ensure that it is available and applied as a new RPM dependency. In my
testing, waiting on the sender side didn't 100% fix things, or sometimes
took a very long time due to the backoff algorithm on the
cloud-early-config receiver. Some tweaks to that made it more robust, but
it is still a game of trying to coordinate timing of two services on either
end. If it works though, I'm all for it.

Just to throw another idea out there... If we want to fix this without
involving storage, I might suggest switching to the qemu-guest-agent that
now exists, with a socket and listening client already in the system vm.
This would be far more robust, I think, than our scripting reading unix
sockets without any sort of protocol or buffer control considerations, and
would likely be more robust to changes in qemu as the guest agent is the
primary target for the feature.

We can directly write our /var/cache/cloud/cmdline from the host like so
(I'm using virsh but we could perhaps communicate with the guest agent
socket directly or via socat):

virsh qemu-agent-command 19 '{"execute":"guest-file-open",
"arguments":{"path":"/tmp/testfile","mode":"w+"}}'
{"return":1001}

virsh qemu-agent-command 19 '{"execute":"guest-file-write",
"arguments":{"handle":1001,"buf-b64":"Zm9vIHdhcyBoZXJlCg=="}}'
{"return":{"count":13,"eof":false}}

virsh qemu-agent-command 19 '{"execute":"guest-file-close",
"arguments":{"handle":1001}}'
{"return":{}}

root@r-54850-VM:~# cat /tmp/testfile
foo was here

We are also able to detect via libvirt that the qemu guest agent is up and
ready. You can see it in the XML when you list a VM.

We do need to keep other hypervisors in mind. This is just an option for a
fix that doesn't involve a larger redesign.

On Fri, Apr 12, 2019 at 10:21 AM Rohit Yadav <rohit.ya...@shapeblue.com>
wrote:

> Hi Simon,
>
>
> I'm exploring a solution for the same, I've found that the python based
> patching script fails to wait for the message to be written on the unix
> socket before that the socket is closed. I reckon this could be related to
> serial port device handling related changes in qemu-ev 2.12, as the same
> mechanism used to work in past versions.
>
>
> I'm exploring/testing a solution where I replace the python based patching
> script into a bash one. Can you test the following in your envrionment
> (ensure socat is installed), just backup and replace the patchviasocket.py
> file with this:
>
> https://gist.github.com/rhtyd/aab23357fef2d8a530c0e83ec8be10c5
>
>
> The short term solution would be one of the ways to ensure patching works
> without much change in the scripts or systemvmtemplate. However, longer
> term we need to explore and standardize patching mechanism across all
> hypervisors, for example by using a small payload via a config drive iso.
>
>
> Regards,
>
> Rohit Yadav
>
> Software Architect, ShapeBlue
>
> https://www.shapeblue.com
>
> ________________________________
> From: Simon Weller <swel...@ena.com.INVALID>
> Sent: Friday, April 12, 2019 8:29:04 PM
> To: dev; users
> Subject: Latest Qemu KVM EV appears to be broken with ACS
>
> All,
>
> After troubleshooting a strange issue with a new lab environment
> yesterday, it appears that the patchviasocket functionality we rely on for
> key and ip injection into our router/SSVM/CPVM images is broken with
> qemu-kvm-ev-2.12.0-18.el7 (January 2019 release). This was tested on Centos
> 7.6.
> No data is injected and this was confirmed using socat on /dev/vport0p1.
> qemu-kvm-ev-2.10.0-21.el7_5.7.1 works, so hopefully this will save someone
> some pain and suffering trying to figure out why the deployed seems broken.
>
> We're going to dig in and see if can figure out the patches responsible
> for it breaking.
>
> -Si
>
>
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>

rohit.ya...@shapeblue.comĀ 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 

Reply via email to