On Tue, 2026-05-19 at 15:57 -0700, Oliver Upton wrote: > On Tue, May 19, 2026 at 10:58:05PM +0100, David Woodhouse wrote: > > On Tue, 2026-05-19 at 14:10 -0700, Oliver Upton wrote: > > > And in the absence of clear evidence of a guest depending on the broken > > > IGROUPR behavior, I don't see how the guest-side changes of Christoffer's > > > series are any different from the multitude of bug fixes that we take > > > every single release cycle. It is an unfortunate bug and I concur with > > > Marc that it doesn't seem like the sort of thing a guest could rely > > > upon. > > > > I find this concerning, because I've already explained this. > > > > There is a very real possibility of guests simply not *noticing* that > > they had bugs in this area, as it didn't *matter* what they wrote to > > these registers since it never worked. > > > > There is an even larger possibility of guests having worked around the > > original issue by *detecting* whether the registers were actually > > writable before choosing to use the alternative groups. And if such a > > guest launches on a new kernel and then needs to be rolled back to an > > older kernel, that will also break. > > The onus is on you to substantiate this claim. I would imagine after > carrying the revert for so long that there must be at least one example > of such a guest?
What? No. We have *avoided* having the bug, specifically so that we do not find out the consequences of the bug. > What ifs and maybes do not meet the bar, in my opinion, for preserving > bug emulation in KVM. Of course there could be a little flexibility with > that but we need to have some way of discriminating between bug fixes > and genuine guest expectations around the behavior of virtual hardware. I believe you have this completely backwards. The expectation of KVM is that do not change guest visible behaviour if there's any reasonable chance that it might cause problems. A stable and mature platform doesn't get to play in its ivory tower and randomly inflict breakage on guests because they "deserve it". I've literally explained the potential failure modes, including the one on rollback if a guest *does* change the group configuration and then needs to be rolled back to the older kernel that doesn't support it. And yes, "ifs and maybes" absolutely *are* the quality bar expected by KVM because — again, as already explained more than once — as we accumulate a bunch of such "unlikely" breakages in a fleet upgrade from, say, 6.1 to 6.12, the likelihood of *one* of them actually turning out to afflict *one* of the zoo of guest operating systems approaches 1. We don't get to just YOLO it. > > > Wrong or not, this behavior is documented unambiguously. From the VGICv2 > > > UAPI documentation: > > > > > > """ > > > Userspace should set GICD_IIDR before setting any other registers (both > > > KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS) to > > > ensure > > > the expected behavior. Unless GICD_IIDR has been set from userspace, > > > writes > > > to the interrupt group registers (GICD_IGROUPR) are ignored. > > > """ > > > > > > I'm not inclined to change that. > > > > That'll all very well... but as far as I can tell, QEMU *doesn't* set > > GICD_IIDR, so it still gets the bizarre behaviour where the *guest* can > > write the registers, but userspace can't. So it looks like it'll work > > except migration will fail. Am I missing something? > > That's exactly it, and why I said tying up UAPI opt-in with > guest-visible registers is a really bad idea. > > > But honestly, I don't care one iota about GICv2; I was only trying to > > do the cleanup while I was there. Feel free to drop that part entirely. > > > > > As a way out of this whole mess, can we > > > instead: > > > > > > - Allow userspace to set IIDR.Revision to 1 > > > > > > - Drop any bug emulation from the handling of IGROUPR registers > > > > It doesn't make sense to allow setting IIDR.Revision to 1 *without* the > > one-liner that actually implements the corresponding behaviour change > > in the IGROUPR registers. > > As I described earlier, this whole IIDR crap inarguably broke UAPI and > obviously normal guest behavior (i.e. reading the register). At minimum > we need to permit previously-valid values for IIDR, even if they carry > no implied behaviors. But the whole *point* of IIDR is to preserve the behaviour. To set the IIDR and *not* have the corresponding behaviour is insanity. > > And as explained at least twice now, it's the > > behaviour change that's *important* here. > > > > The fact that it's a long-standing bug in KVM which downstream has been > > working around for a long time doesn't matter. The unconditional > > behavioural change *is* a bug and we should fix it. > > That is the nature of a bug fix. If you can provide some concrete > evidence of a guest depending on the RAZ/WI behavior then I agree we > need to preserve the old behavior. > > Otherwise I see this as a matter of principle in how we do bug fixes to > KVM. Even if upstream took the strictest possible stance towards behavior > changes we will invariably fail to account for some minutia. No. Don't pretend that this is hard. KVM on x86 has been quietly getting this right for years. Yes, there is sometimes *some* subjectivity around it, and it's sometimes reasonable to just unilaterally change behaviours. This is not, and was not, once of those cases. > > > - Special-case the stupid GICv2 UAPI where IGROUPR are only writable if > > > the VMM has written to IIDR and the revision >= 2 > > > > That already *is* a special case, right? And you'd rather leave it as it is? > > Left as documented, yes. With the exception that revision == 1 writes > not be considered opt-in to restorable IGROUPR. Don't do that. Just leave it broken, with QEMU not even working. I'm beyond caring about GICv2 now.
smime.p7s
Description: S/MIME cryptographic signature

