Hi Daniel, just wanted to ask if you had a chance to check the shared file.
Let me know if you need more info. Best regards, Roman On Tue, Sep 13, 2022 at 5:31 PM Roman Mohr <rm...@google.com> wrote: > On Thu, Sep 8, 2022 at 4:32 PM Daniel P. Berrangé <berra...@redhat.com> > wrote: > >> On Thu, Sep 08, 2022 at 04:22:17PM +0200, Roman Mohr wrote: >> > On Thu, Sep 8, 2022 at 4:04 PM Daniel P. Berrangé <berra...@redhat.com> >> > wrote: >> > >> > > On Thu, Sep 08, 2022 at 02:23:31PM +0100, Daniel P. Berrangé wrote: >> > > > On Thu, Sep 08, 2022 at 03:10:09PM +0200, Roman Mohr wrote: >> > > > > On Thu, Sep 8, 2022 at 2:56 PM Daniel P. Berrangé < >> berra...@redhat.com >> > > > >> > > > > wrote: >> > > > > >> > > > > > On Thu, Sep 08, 2022 at 02:24:00PM +0200, Roman Mohr wrote: >> > > > > > > Hi, >> > > > > > > >> > > > > > > >> > > > > > > I have a question regarding capability caching in the >> context of >> > > > > > KubeVirt. >> > > > > > > Since we start in KubeVirt one libvirt instance per VM, >> libvirt >> > > has to >> > > > > > > re-discover on every VM start the qemu capabilities which >> leads to >> > > a >> > > > > > 1-2s+ >> > > > > > > delay in startup. >> > > > > > > >> > > > > > > We already discover the features in a dedicated KubeVirt pod >> on >> > > each >> > > > > > node. >> > > > > > > Therefore I tried to copy the capabilities over to see if that >> > > would >> > > > > > work. >> > > > > > > >> > > > > > > It looks like in general it could work, but libvirt seems to >> > > detect a >> > > > > > > mismatch in the exposed KVM CPU ID in every pod. Therefore it >> > > invalidates >> > > > > > > the cache. The recreated capability cache looks esctly like >> the >> > > original >> > > > > > > one though ... >> > > > > > > >> > > > > > > The check responsible for the invalidation is this: >> > > > > > > >> > > > > > > ``` >> > > > > > > Outdated capabilities for '%s': host cpuid changed >> > > > > > > ``` >> > > > > > > >> > > > > > > So the KVM_GET_SUPPORTED_CPUID call seems to return >> > > > > > > slightly different values in different containers. >> > > > > > > >> > > > > > > After trying out the attached golang scripts in different >> > > containers, I >> > > > > > > could indeed see differences. >> > > > > > > >> > > > > > > I can however not really judge what the differences in these >> KVM >> > > function >> > > > > > > registers mean and I am curious if someone else knows. The >> files >> > > are >> > > > > > > attached too (as json for easy diffing). >> > > > > > >> > > > > > Can you confirm whether the two attached data files were >> captured >> > > > > > by containers running on the same physical host, or could each >> > > > > > container have run on a different host. >> > > > > > >> > > > > >> > > > > They are coming from the same host, that is the most surprising >> bit >> > > for me. >> > > > > I am also very sure that this is the case, because I only had one >> k8s >> > > node >> > > > > from where I took these. >> > > > > The containers however differ (obviously) on namespaces and on the >> > > > > privilege level (less obvious). The handler dump is from a fully >> > > privileged >> > > > > container. >> > > > >> > > > The privilege level sounds like something that might be impactful, >> > > > so I'll investigate that. I'd be pretty surprised for namespaces >> > > > to have any impact thnough. >> > > >> > > The privilege level is a red herring. Peter reminded me that we have >> > > to filter out some parts of CPUID because the APIC IDs vary depending >> > > on what host CPU the task executes on. >> > > >> > > >> > > >> https://gitlab.com/libvirt/libvirt/-/blob/master/src/util/virhostcpu.c#L1346 >> > > >> > > In the 2 jSON files you provide, the differences i see should already >> > > be matched by >> > > >> > > /* filter out local apic id */ >> > > if (entry->function == 0x01 && entry->index == 0x00) >> > > entry->ebx &= 0x00ffffff; >> > > if (entry->function == 0x0b) >> > > entry->edx &= 0xffffff00; >> > > >> > > so those differences ought not to be causing the cache to be >> > > invalidated. >> > > >> > >> > Hm, maybe I misinterpreted the logs then. The snipped I looked at was >> this: >> > >> > >> > ``` >> > {"component":"virt-launcher","level":"info","msg":"/dev/kvm has changed >> > (1661786802 vs >> > >> 0)","pos":"virQEMUCapsKVMUsable:4850","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.936000Z"} >> > {"component":"virt-launcher","level":"info","msg":"a=0x7f8138153ba0, >> > >> b=0x7f818001c480","pos":"virCPUDataIsIdentical:1178","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} >> > {"component":"virt-launcher","level":"info","msg":"Outdated capabilities >> > for '/usr/bin/qemu-system-x86_64': host cpuid >> > >> changed","pos":"virQEMUCapsIsValid:4993","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} >> > {"component":"virt-launcher","level":"info","msg":"Outdated cached >> > capabilities >> > >> '/var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml' >> > for >> > >> '/usr/bin/qemu-system-x86_64'","pos":"virFileCacheLoad:163","subcomponent":"libvirt","thread":"25","timestamp":"2022-08-29T15:26:42.939000Z"} >> > ``` >> >> Can you capture the >> >> >> /var/cache/libvirt/qemu/capabilities/926803a9278e445ec919c2b6cbd8c1c449c75b26dcb1686b774314180376c725.xml' > > >> from the virt-handler and virt-launcher pods. It contains a <cpuid> >> block that will show us the differences libvirt recored, /after/, >> libvirt has done its filtering. This will show if there si more >> we need to filter. >> > > Done. Only attached the one from handler since it is 100% identical to the > launcher. > > Let me know if you need more information. > > Thanks and best regards, > Roman > > >> >> > I had the impression from the code that the `/dev/kvm` change (because >> the >> > containers are not created at the same time) does not invalidate it >> either. >> > >> > I added the whole debug log, maybe I missed something obvious. >> > >> > Does it make a difference if the cache is created via `virsh >> > domcapabilities` and `virsh capabilities` or via defining the first >> domain? >> >> They'll all end up at the same caching scode so should not make any >> difference. >> >> > >> > Best regards, >> > Roman >> > >> > >> > > >> > > With regards, >> > > Daniel >> > > -- >> > > |: https://berrange.com -o- >> > > https://www.flickr.com/photos/dberrange :| >> > > |: https://libvirt.org -o- >> > > https://fstop138.berrange.com :| >> > > |: https://entangle-photo.org -o- >> > > https://www.instagram.com/dberrange :| >> > > >> > > >> >> >> >> With regards, >> Daniel >> -- >> |: https://berrange.com -o- >> https://www.flickr.com/photos/dberrange :| >> |: https://libvirt.org -o- >> https://fstop138.berrange.com :| >> |: https://entangle-photo.org -o- >> https://www.instagram.com/dberrange :| >> >>