Hello Jorge, or anyone else affected, Accepted libvirt into zesty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/2.5.0-3ubuntu5.4 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-zesty to verification-done-zesty. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-zesty. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: libvirt (Ubuntu Yakkety) Status: In Progress => Won't Fix ** Changed in: libvirt (Ubuntu Zesty) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-zesty ** Changed in: libvirt (Ubuntu Xenial) Status: In Progress => Fix Committed ** Tags added: verification-needed-xenial -- You received this bug notification because you are a member of नेपाली भाषा समायोजकहरुको समूह, which is subscribed to Xenial. Matching subscriptions: Ubuntu 16.04 Bugs https://bugs.launchpad.net/bugs/1705132 Title: Large memory guests, "error: monitor socket did not show up: No such file or directory" Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive mitaka series: New Status in Ubuntu Cloud Archive ocata series: New Status in libvirt package in Ubuntu: Fix Released Status in libvirt source package in Xenial: Fix Committed Status in libvirt source package in Yakkety: Won't Fix Status in libvirt source package in Zesty: Fix Committed Bug description: [Description] - Configured a machine with 32 static VCPUs, 160GB of RAM using 1G hugepages on a NUMA capable machine. Domain definition (http://pastebin.ubuntu.com/25121106/) - Once started (virsh start). Libvirt log. LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name reproducer2 -S -machine pc-i440fx-2.5,accel=kvm,usb=off -cpu host -m 124928 -realtime mlock=off -smp 32,sockets=16,cores=1,threads=2 -object memory-backend- file,id=ram-node0,prealloc=yes,mem- path=/dev/hugepages/libvirt/qemu,share=yes,size=64424509440,host- nodes=0,policy=bind -numa node,nodeid=0,cpus=0-15,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem- path=/dev/hugepages/libvirt/qemu,share=yes,size=66571993088,host- nodes=1,policy=bind -numa node,nodeid=1,cpus=16-31,memdev=ram-node1 -uuid d7a4af7f-7549-4b44-8ceb-4a6c951388d4 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain- reproducer2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/uvtool/libvirt/images/test.qcow,format=qcow2,if=none,id =drive-virtio-disk0,cache=none -device virtio-blk- pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio- disk0,bootindex=1 -chardev pty,id=charserial0 -device isa- serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus- vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon- pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on Then the following error is raised. virsh start reproducer2 error: Failed to start domain reproducer2 error: monitor socket did not show up: No such file or directory - The fix is done via backports, as a TL;DR the change does: 1. instead of sleeping too short (1ms) in a loop for very long start small but exponentially increase for the few cases that need long. That way fast actions are done fast, but long actions are no cpu-hogs 2. huge guests get ~1s per 1Gb extra timeout to come up, that allows huge guests to initialize properly. [Impact] * Cannot start virtual machines with large pools of memory allocated on NUMA nodes. [Test Case] * this is a tradeoff of memory clearing speed vs guest size. Once the clearing of guest memory exceeds ~30 seconds the issue will trigger. * Guest must be backed by huge pages as otherwise the kernel will fault in on demand instead of needing the initial clear. * One way to "slow down" is to Configure a Machine with multiple NUMA nodes. root@buneary:/home/ubuntu# virsh freepages 0 1G 1048576KiB: 60 root@buneary:/home/ubuntu# virsh freepages 1 1G 1048576KiB: 62 * Another one to slow down the init is to just use a really heg guest. In the example 122G guest was enough. (full guest definition: http://paste.ubuntu.com/25125500/) <memory unit='GiB'>120</memory> <currentMemory unit='GiB'>120</currentMemory> <memoryBacking> <hugepages> <page size='1' unit='GiB' nodeset='0'/> <page size='1' unit='GiB' nodeset='1'/> </hugepages> </memoryBacking> <cpu mode='host-passthrough'> <topology sockets='16' cores='1' threads='2'/> <numa> <cell id='0' cpus='0-15' memory='60' unit='GiB' memAccess='shared'/> <cell id='1' cpus='16-31' memory='62' unit='GiB' memAccess='shared'/> </numa> </cpu> * Define the guest, and try to start it. $ virsh define reproducer.xml $ virsh start reproducer * Verify that the following error is raised: root@buneary:/home/ubuntu# virsh start reproducer2 error: Failed to start domain reproducer2 error: monitor socket did not show up: No such file or directory [Expected Behavior] * Machine is started without issues as displayed https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1705132/comments/7 [Regression Potential] * The behavior on timeouts around starting a guest changed. We backported the fix along with a fix to that new behavior (where guests seemed to wait forever due to the exponential wait). Still the "allowed" wait time is increased, but users might expect it instantly as they are used from their laptop. Now if one starts a 1TB guest the allowed time is base+1000s. A user might think a while it is broken or hanging, but there is no way to avoid that. OTOH before the fix it would have failed to start after 30 seconds so not really a regression IMHO. [Other Info] https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=85af0b803cd19a03f71bd01ab4e045552410368f;hp=67dcb797ed7f1fbb048aa47006576f424923933b To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1705132/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~group.of.nepali.translators Post to : [email protected] Unsubscribe : https://launchpad.net/~group.of.nepali.translators More help : https://help.launchpad.net/ListHelp

