On 05/04/2018 03:42, Gabe Black wrote:
Hi folks. I'm continuing to try to iron out problems with KVM on ARM, and the problem I'm working on specifically right now is that the mouse device gets spurious bad command bytes which panics gem5. What I've found so far is that the guest kernel will frequently time out while waiting for an ACK to a byte it sent to the mouse, even though the timeout looks like it should be 200ms, the simulation quantum I'm using is 1ms, and the delay between an event and the corresponding interrupt is configured to be 1us. I think this eventually throws the PS2 driver out of whack, and it ends up sending a data byte (or something else?) to the mouse which the mouse misinterprets as a command, causing the panic.
Last time I looked at this, I suspected that the PS/2 model wasn't clearing some interrupts. The GIC model in gem5 normally doesn't worry about that and raises an interrupt every time someone calls the sendInt(). The behaviour I have observed from the kernel is that it doesn't post a new interrupt unless you first clear the old interrupt. This caused some issues with a other models in the past (IIRC, the UART). Make sure you test this in a single-threaded simulator as well to avoid other weirdness due to thread syncrhonisation in gem5. I assume you're already doing this though.
My current theory for why that's happening is that even when the VM is not running, the hardware supported virtual timer the CPU may have scheduled to keep track of its timeout may be "running" in the sense that the kernel will update it to reflect the descheduled time once the VM is running again. That could mean that 200ms of real time could pass, looking like 200ms of simulated time to the VCPU even if a smaller amount of actual execution time was supposed to happen. I'm not sure if that's a correct interpretation, but this ASPLOS paper *seems* to say something like that is possible. http://www.cs.columbia.edu/~cdall/pubs/asplos019-dall.pdf
I have never been happy with the way we handle the timer on the Arm KVM CPUs. It's possible to re-sync the virtual counter when entering into KVM. A simple way to test that would be to update KVM_REG_ARM_TIMER_CNT / MISCREG_CNTVCT whenever entering into KVM. The Linux side should update the virtual timer offset when you write an absolute time to this register. This should work for Linux, but you might have issues with other OSes that insist on using the physical timer instead of the virtual timer.
I've also seen very weird behavior as far as how many instructions KVM thinks are being executed per tick, so I wouldn't discount there being something off about how it's keeping track of time. I haven't been able to attach GDB to the KVM VCPUs for instance, even though it looks like all the pieces are there for that to work. It seems that KVM is supposed to exit after a given number of instructions, but it's just not for some reason.
I have used GDB in the past, but the support is very flaky. To use GDB with KVM, I had to force a thread context sync on every KVM entry/exit. You can do this by setting the alwaysSyncTC param, but it will kill your performance. The proper fix for this issue is to implement a custom KVM thread context that lazily synchronises individual registers instead of only synchronising on drain (and some other calls). Cheers, Andreas IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
