Wow, that was fast. Good work.

The script seems to work for me. There was one case where I rebooted the
router and got the old link local IP somehow. I'm not sure if that was a
timing issue in seeing the existing /var/cache/cloud/cmdline before the new
one was written or what, but if it was a timing issue it would seem like we
should already have that problem with the existing cloud-early-config.

On Fri, Apr 12, 2019 at 12:24 PM Rohit Yadav <rohit.ya...@shapeblue.com>
wrote:

> Hi Marcus, Simon,
>
>
> I explore two of the short term solutions and I've a working (work in
> progress) script that replaces the patchviasocket script to use the qemu
> guest agent (that is installed in 4.11+ sytemvmtemplate). This was part of
> a scoping exercise for solving the patching problem for qemu 2.12+ (Ubuntu
> 19.04 has 3.x version).
>
>
> This is what I've so far, however, further testing is needed:
>
> https://gist.github.com/rhtyd/ddb42c4c7581c4129ca04fbb829f16cf
>
>
> The logic is completely written in bash as:
>
> - Try if we're able to contact the guest agent
>
> - Once we're able to connect, confirm that the I/O is not error prone
>
> - Then write the payload as file (the ssh public key and cmdline string)
>
> - Then fix file permissions
> - Hope that internally cloud-early-config would detect the cmdline we had
> saved and patching would work
>
>
> While this may work, for the long term a proper fix is needed that should
> be a standard patching mechanism across all hypervisors.
>
>
> Regards,
>
> Rohit Yadav
>
> Software Architect, ShapeBlue
>
> https://www.shapeblue.com
>
> ________________________________
> From: Marcus <shadow...@gmail.com>
> Sent: Friday, April 12, 2019 11:30:46 PM
> To: dev@cloudstack.apache.org
> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
>
> Long ago it was a disk. The problem was that these disks had to go
> somewhere, a place where they could survive migrations, which didn't work
> well for block based primary storage... at least for the code base at the
> time. Using virtio socket was seen as a fairly standard way to communicate
> temporary information to the guest, and didn't require managing the
> lifecycle of a special disk.
>
> I believe the current problem is that the sender needs to remain connected
> until the receiver has read. Maybe socat does this, but if so we need to
> ensure that it is available and applied as a new RPM dependency. In my
> testing, waiting on the sender side didn't 100% fix things, or sometimes
> took a very long time due to the backoff algorithm on the
> cloud-early-config receiver. Some tweaks to that made it more robust, but
> it is still a game of trying to coordinate timing of two services on either
> end. If it works though, I'm all for it.
>
> Just to throw another idea out there... If we want to fix this without
> involving storage, I might suggest switching to the qemu-guest-agent that
> now exists, with a socket and listening client already in the system vm.
> This would be far more robust, I think, than our scripting reading unix
> sockets without any sort of protocol or buffer control considerations, and
> would likely be more robust to changes in qemu as the guest agent is the
> primary target for the feature.
>
> We can directly write our /var/cache/cloud/cmdline from the host like so
> (I'm using virsh but we could perhaps communicate with the guest agent
> socket directly or via socat):
>
> virsh qemu-agent-command 19 '{"execute":"guest-file-open",
> "arguments":{"path":"/tmp/testfile","mode":"w+"}}'
> {"return":1001}
>
> virsh qemu-agent-command 19 '{"execute":"guest-file-write",
> "arguments":{"handle":1001,"buf-b64":"Zm9vIHdhcyBoZXJlCg=="}}'
> {"return":{"count":13,"eof":false}}
>
> virsh qemu-agent-command 19 '{"execute":"guest-file-close",
> "arguments":{"handle":1001}}'
> {"return":{}}
>
> root@r-54850-VM:~# cat /tmp/testfile
> foo was here
>
> We are also able to detect via libvirt that the qemu guest agent is up and
> ready. You can see it in the XML when you list a VM.
>
> We do need to keep other hypervisors in mind. This is just an option for a
> fix that doesn't involve a larger redesign.
>
> On Fri, Apr 12, 2019 at 10:21 AM Rohit Yadav <rohit.ya...@shapeblue.com>
> wrote:
>
> > Hi Simon,
> >
> >
> > I'm exploring a solution for the same, I've found that the python based
> > patching script fails to wait for the message to be written on the unix
> > socket before that the socket is closed. I reckon this could be related
> to
> > serial port device handling related changes in qemu-ev 2.12, as the same
> > mechanism used to work in past versions.
> >
> >
> > I'm exploring/testing a solution where I replace the python based
> patching
> > script into a bash one. Can you test the following in your envrionment
> > (ensure socat is installed), just backup and replace the
> patchviasocket.py
> > file with this:
> >
> > https://gist.github.com/rhtyd/aab23357fef2d8a530c0e83ec8be10c5
> >
> >
> > The short term solution would be one of the ways to ensure patching works
> > without much change in the scripts or systemvmtemplate. However, longer
> > term we need to explore and standardize patching mechanism across all
> > hypervisors, for example by using a small payload via a config drive iso.
> >
> >
> > Regards,
> >
> > Rohit Yadav
> >
> > Software Architect, ShapeBlue
> >
> > https://www.shapeblue.com
> >
> > ________________________________
> > From: Simon Weller <swel...@ena.com.INVALID>
> > Sent: Friday, April 12, 2019 8:29:04 PM
> > To: dev; users
> > Subject: Latest Qemu KVM EV appears to be broken with ACS
> >
> > All,
> >
> > After troubleshooting a strange issue with a new lab environment
> > yesterday, it appears that the patchviasocket functionality we rely on
> for
> > key and ip injection into our router/SSVM/CPVM images is broken with
> > qemu-kvm-ev-2.12.0-18.el7 (January 2019 release). This was tested on
> Centos
> > 7.6.
> > No data is injected and this was confirmed using socat on /dev/vport0p1.
> > qemu-kvm-ev-2.10.0-21.el7_5.7.1 works, so hopefully this will save
> someone
> > some pain and suffering trying to figure out why the deployed seems
> broken.
> >
> > We're going to dig in and see if can figure out the patches responsible
> > for it breaking.
> >
> > -Si
> >
> >
> >
> > rohit.ya...@shapeblue.com
> > www.shapeblue.com<http://www.shapeblue.com>
> > Amadeus House, Floral Street, London  WC2E 9DPUK
> > @shapeblue
> >
> >
> >
> >
>
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>
>

Reply via email to