Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/13/2011 10:21 PM, Don Zickus wrote: Or are you saying an NMI in an idle system will have the same %rip thus falsely detecting a back-to-back NMI? That's easy to avoid - insert an instruction zeroing the last nmi_rip somewhere before or after hlt. It's always okay to execute such an instruction (outside the nmi handler itself), since nmi_rip is meant to detect a no instructions executed condition. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Wed, Sep 14, 2011 at 10:00:07AM +0300, Avi Kivity wrote: On 09/13/2011 10:21 PM, Don Zickus wrote: Or are you saying an NMI in an idle system will have the same %rip thus falsely detecting a back-to-back NMI? That's easy to avoid - insert an instruction zeroing the last nmi_rip somewhere before or after hlt. It's always okay to execute such an instruction (outside the nmi handler itself), since nmi_rip is meant to detect a no instructions executed condition. Ah. Like a touch_nmi_watchdog() type of thing. Interesting. I'll poke around the idle code. Need to instrument a reproducer first. Thanks, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Wed, Sep 14, 2011 at 10:00:07AM +0300, Avi Kivity wrote: On 09/13/2011 10:21 PM, Don Zickus wrote: Or are you saying an NMI in an idle system will have the same %rip thus falsely detecting a back-to-back NMI? That's easy to avoid - insert an instruction zeroing the last nmi_rip somewhere before or after hlt. It's always okay to execute such an instruction (outside the nmi handler itself), since nmi_rip is meant to detect a no instructions executed condition. At least for classic hlt there is no simple after hlt because it's all interrupt handlers and exceptions and everything else that can interrupt combined. It may work with newer MWAIT. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/14/2011 05:49 PM, Andi Kleen wrote: On Wed, Sep 14, 2011 at 10:00:07AM +0300, Avi Kivity wrote: On 09/13/2011 10:21 PM, Don Zickus wrote: Or are you saying an NMI in an idle system will have the same %rip thus falsely detecting a back-to-back NMI? That's easy to avoid - insert an instruction zeroing the last nmi_rip somewhere before or after hlt. It's always okay to execute such an instruction (outside the nmi handler itself), since nmi_rip is meant to detect a no instructions executed condition. At least for classic hlt there is no simple after hlt because it's all interrupt handlers and exceptions and everything else that can interrupt combined. If an NMI hits in an interrupt handler, or in the after hlt section before the write-to-last-nmi-rip, then we'll see that %rip has changed. If it hits after the write-to-last-nmi-rip instruction (or in the hlt itself), then we'll also see that %rip has changed, due to the effect of that instruction. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
If an NMI hits in an interrupt handler, or in the after hlt section before the write-to-last-nmi-rip, then we'll see that %rip has changed. If it hits after the write-to-last-nmi-rip instruction (or in the hlt itself), then we'll also see that %rip has changed, due to the effect of that instruction. It won't handle multiple NMIs in halt. I assume that's reasonable common. -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/14/2011 08:28 PM, Andi Kleen wrote: If an NMI hits in an interrupt handler, or in the after hlt section before the write-to-last-nmi-rip, then we'll see that %rip has changed. If it hits after the write-to-last-nmi-rip instruction (or in the hlt itself), then we'll also see that %rip has changed, due to the effect of that instruction. It won't handle multiple NMIs in halt. I assume that's reasonable common. Why not? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Wed, Sep 14, 2011 at 10:26:21PM +0300, Avi Kivity wrote: On 09/14/2011 08:28 PM, Andi Kleen wrote: If an NMI hits in an interrupt handler, or in the after hlt section before the write-to-last-nmi-rip, then we'll see that %rip has changed. If it hits after the write-to-last-nmi-rip instruction (or in the hlt itself), then we'll also see that %rip has changed, due to the effect of that instruction. It won't handle multiple NMIs in halt. I assume that's reasonable common. Why not? They all have the same original RIPs and there is no way to distingush them. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/14/2011 10:34 PM, Andi Kleen wrote: On Wed, Sep 14, 2011 at 10:26:21PM +0300, Avi Kivity wrote: On 09/14/2011 08:28 PM, Andi Kleen wrote: If an NMI hits in an interrupt handler, or in the after hlt section before the write-to-last-nmi-rip, then we'll see that %rip has changed. If it hits after the write-to-last-nmi-rip instruction (or in the hlt itself), then we'll also see that %rip has changed, due to the effect of that instruction. It won't handle multiple NMIs in halt. I assume that's reasonable common. Why not? They all have the same original RIPs and there is no way to distingush them. That's how we detect multiple NMIs. 1. First NMI is posted 2. NMI handler starts 3. 2nd NMI posted, queued 4. First NMI source handled 5. IRET 6. Queued NMI hits the core 7. back-to-back NMI detected (same rip) 8. Second (and third...) NMI source handled 9. Execution continues. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Wed, Sep 07, 2011 at 08:09:37PM +0300, Avi Kivity wrote: But then the downside here is we accidentally handle an NMI that was latched. This would cause a 'Dazed on confused' message as that NMI was already handled by the previous NMI. We are working on an algorithm to detect this condition and flag it (nothing complicated). But it may never be perfect. On the other hand, what else are we going to do with an edge-triggered shared interrupt line? How about, during NMI, save %rip to a per-cpu variable. Handle just one cause. If, on the next NMI, we hit the same %rip, assume back-to-back NMI has occured and now handle all causes. So I got around to implementing this and it seems to work great. The back to back NMIs are detected properly using the %rip and that info is passed to the NMI notifier. That info is used to determine if only the first handler to report 'handled' is executed or _all_ the handlers are executed. I think all the 'unknown' NMIs I generated with various perf runs have disappeared. I'll post a new version of my nmi notifier rewrite soon. Thanks for the great suggestion Avi! Cheers, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
So I got around to implementing this and it seems to work great. The back to back NMIs are detected properly using the %rip and that info is passed to the NMI notifier. That info is used to determine if only the first handler to report 'handled' is executed or _all_ the handlers are executed. I think all the 'unknown' NMIs I generated with various perf runs have disappeared. I'll post a new version of my nmi notifier rewrite soon. This will fail when the system is idle. -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Tue, Sep 13, 2011 at 09:03:20PM +0200, Andi Kleen wrote: So I got around to implementing this and it seems to work great. The back to back NMIs are detected properly using the %rip and that info is passed to the NMI notifier. That info is used to determine if only the first handler to report 'handled' is executed or _all_ the handlers are executed. I think all the 'unknown' NMIs I generated with various perf runs have disappeared. I'll post a new version of my nmi notifier rewrite soon. This will fail when the system is idle. Oh one other thing I forgot to mention is that an NMI handler has to return a value greater than 1, meaning that it handle multiple NMI events during this NMI to even enable the 'NMI swallowing' algorithm. Cheers, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Tue, Sep 13, 2011 at 09:03:20PM +0200, Andi Kleen wrote: So I got around to implementing this and it seems to work great. The back to back NMIs are detected properly using the %rip and that info is passed to the NMI notifier. That info is used to determine if only the first handler to report 'handled' is executed or _all_ the handlers are executed. I think all the 'unknown' NMIs I generated with various perf runs have disappeared. I'll post a new version of my nmi notifier rewrite soon. This will fail when the system is idle. Heh. I don't think I explained what I was doing properly. I had a bunch of perf runs going simultaneously on my Core2quad. Probably generated around 40,000 NMIs in a few minutes. I think around a 1000 or so detected back-to-back NMIs. With the current NMI detection algorithm in perf to determine back-to-back NMIs, I can usually use the above scenario to generate an 'unknown' NMI. Actually numerous ones before the box freezes. It is a false positive. With my current code and Avi's idea, all those disappeared as they are now 'swallowed' by the algorithm. That seemed like a positive. However, being cautious, I decided to instrument lkdtm to inject NMIs to force unknown NMI conditions (apic-send_IPI_mask(cpumask_of(smp_processor_id(), NMI_VECTOR))) I tried to inject the NMIs while my trace_printk buffer was flooding my screen with 'back-to-back NMIs detected'. I did this 4 or 5 times and every single one of them were detected as an 'unknown' NMI. So this was good too, in the sense it was not swallowing 'real' unknown NMIs. Does that make sense? Or are you saying an NMI in an idle system will have the same %rip thus falsely detecting a back-to-back NMI? Cheers, Don -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
Or are you saying an NMI in an idle system will have the same %rip thus falsely detecting a back-to-back NMI? Yup. Another problem is very long running instructions, like WBINVD and some others. If there's a high frequency NMI it may well hit multiple times in a single instance. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Tue, Sep 13, 2011 at 04:53:18PM -0400, Don Zickus wrote: On Tue, Sep 13, 2011 at 09:58:38PM +0200, Andi Kleen wrote: Or are you saying an NMI in an idle system will have the same %rip thus falsely detecting a back-to-back NMI? Yup. Hmm. That sucks. Is there another register that can be used in conjunction to seperate it, like sp or something? Or we can we assume Not that I know of. than an idle cpu isn't doing much for local NMI IPIs and that the only NMIs that would interrupt it would be external ones? There can be still activity on the idle system, e.g. SMM or some Hypervisor in the background. If you profile events those might trigger samples, but they will be all accounted to the MWAIT. Another problem is very long running instructions, like WBINVD and some others. If there's a high frequency NMI it may well hit multiple times in a single instance. I thought NMIs happen on instruction boundaries, maybe not. Yes, but there may be multiple queued up when the instruction has finished executing. So you get multiple at the boundary. Honestly, our current implementation would probably be tripped up with those examples too, so I don't think I am making things worse (assuming the only high frequency NMI is coming from perf). I wish perf/oprofile would just stop using NMIs. The interrupt off regions are getting smaller and smaller and they can be profiled in a limited way using PEBS anyways. Or maybe make it a knob with default to off. -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/08/2011 08:29 PM, Jeremy Fitzhardinge wrote: I don't think it's that expensive, especially compared to the double-context-switch and vmexit of the spinner going to sleep. On AMD we do have to take an extra vmexit (on IRET) though. Fair enough - so if the vcpu blocks itself, it ends up being rescheduled in the NMI handler, which then returns to the lock slowpath. And if its a normal hlt, then you can also take interrupts if they're enabled while spinning. Yes. To be clear, just execute 'hlt' and inherit the interrupt enable flag from the environment. And if you get nested NMIs (since you can get multiple spurious kicks, or from other NMI sources), then one NMI will get latched and any others will get dropped? While we're in the NMI handler, any further NMIs will be collapsed and queued (so one NMI can be in service and just one other queued behind it). We can detect this condition by checking %rip on stack. Well we could have a specialized sleep/wakeup hypercall pair like Xen, but I'd like to avoid it if at all possible. Yeah, that's something that just falls out of the existing event channel machinery, so it isn't something that I specifically added. But it does mean that you simply end up with a hypercall returning on kick, with no real complexities. It also has to return on interrupt, MNI, INIT etc. No real complexities is a meaningless phrase on x86, though it is fertile ground for math puns. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/07/2011 10:09 PM, Jeremy Fitzhardinge wrote: On 09/07/2011 10:41 AM, Avi Kivity wrote: Hm, I'm interested to know what you're thinking in more detail. Can you leave an NMI pending before you block in the same way you can with sti;halt with normal interrupts? Nope. But you can do if (regs-rip in critical section) regs-rip = after_halt; and effectively emulate it. The critical section is something like critical_section_start: if (woken_up) goto critical_section_end; hlt critical_section_end: Hm. It's a pity you have to deliver an actual interrupt to implement the kick though. I don't think it's that expensive, especially compared to the double-context-switch and vmexit of the spinner going to sleep. On AMD we do have to take an extra vmexit (on IRET) though. I was thinking you might want to do something with monitor/mwait to implement the blocking/kick ops. (Handwave) monitor/mwait are incredibly expensive to virtualize since they require write-protecting a page, IPIs flying everywhere and flushing tlbs, not to mention my lovely hugepages being broken up mercilessly. Or what about a futex-like hypercall? Well we could have a specialized sleep/wakeup hypercall pair like Xen, but I'd like to avoid it if at all possible. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/08/2011 12:51 AM, Avi Kivity wrote: On 09/07/2011 10:09 PM, Jeremy Fitzhardinge wrote: On 09/07/2011 10:41 AM, Avi Kivity wrote: Hm, I'm interested to know what you're thinking in more detail. Can you leave an NMI pending before you block in the same way you can with sti;halt with normal interrupts? Nope. But you can do if (regs-rip in critical section) regs-rip = after_halt; and effectively emulate it. The critical section is something like critical_section_start: if (woken_up) goto critical_section_end; hlt critical_section_end: Hm. It's a pity you have to deliver an actual interrupt to implement the kick though. I don't think it's that expensive, especially compared to the double-context-switch and vmexit of the spinner going to sleep. On AMD we do have to take an extra vmexit (on IRET) though. Fair enough - so if the vcpu blocks itself, it ends up being rescheduled in the NMI handler, which then returns to the lock slowpath. And if its a normal hlt, then you can also take interrupts if they're enabled while spinning. And if you get nested NMIs (since you can get multiple spurious kicks, or from other NMI sources), then one NMI will get latched and any others will get dropped? Well we could have a specialized sleep/wakeup hypercall pair like Xen, but I'd like to avoid it if at all possible. Yeah, that's something that just falls out of the existing event channel machinery, so it isn't something that I specifically added. But it does mean that you simply end up with a hypercall returning on kick, with no real complexities. J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/07/2011 06:56 PM, Don Zickus wrote: And hope that no other NMI was generated while we're handling this one. It's a little... fragile? No. If another NMI is generated while we are processing the current one it should get latched. Upon completion of the current one, the cpu should jump right back into the nmi exception routine again. The only downside is when multiple NMIs come in during the processing of the current one. Only one can be latched, so the others get dropped. Ah, yes, I remember now. But we are addressing that. May I ask how? Detecting a back-to-back NMI? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/07/2011 04:44 PM, Don Zickus wrote: Is there a way to tell whether an NMI was internally or externally generated? I don't think so, especially as two or more NMIs can be coalesced. So any NMI received on this first cpu has to check the NMI reason port? Well we cheat and execute all the nmi handlers first. If they come back as handled, we skip the check for the external NMI. And hope that no other NMI was generated while we're handling this one. It's a little... fragile? But you are right, other than checking the reason port, there isn't a way to determine if an NMI is internally or externally generated. Ouch. But on the other hand, I don't really care if you can say that this path will never be called in a virtual machine. Does virtual machines support hot remove of cpus? Probably not considering bare-metal barely supports it. They do. But vcpus probably don't have the notion of a bsp cpu, so perhaps virtual machines can get away with it easier? (I don't know enough about the hot cpu remove code to really explain it, just enough to know it can cause problems and people are trying to address it). The concept of a bsp exists in exactly the same way as on real hardware. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Wed, Sep 07, 2011 at 06:11:14PM +0300, Avi Kivity wrote: On 09/07/2011 04:44 PM, Don Zickus wrote: Is there a way to tell whether an NMI was internally or externally generated? I don't think so, especially as two or more NMIs can be coalesced. So any NMI received on this first cpu has to check the NMI reason port? Well we cheat and execute all the nmi handlers first. If they come back as handled, we skip the check for the external NMI. And hope that no other NMI was generated while we're handling this one. It's a little... fragile? No. If another NMI is generated while we are processing the current one it should get latched. Upon completion of the current one, the cpu should jump right back into the nmi exception routine again. The only downside is when multiple NMIs come in during the processing of the current one. Only one can be latched, so the others get dropped. But we are addressing that. Cheers, Don But you are right, other than checking the reason port, there isn't a way to determine if an NMI is internally or externally generated. Ouch. But on the other hand, I don't really care if you can say that this path will never be called in a virtual machine. Does virtual machines support hot remove of cpus? Probably not considering bare-metal barely supports it. They do. But vcpus probably don't have the notion of a bsp cpu, so perhaps virtual machines can get away with it easier? (I don't know enough about the hot cpu remove code to really explain it, just enough to know it can cause problems and people are trying to address it). The concept of a bsp exists in exactly the same way as on real hardware. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Wed, Sep 07, 2011 at 07:13:58AM +0300, Avi Kivity wrote: On 09/06/2011 09:27 PM, Don Zickus wrote: On Tue, Sep 06, 2011 at 11:07:26AM -0700, Jeremy Fitzhardinge wrote: But, erm, does that even make sense? I'm assuming the NMI reason port tells the CPU why it got an NMI. If multiple CPUs can get NMIs and there's only a single reason port, then doesn't that mean that either 1) they all got the NMI for the same reason, or 2) having a single port is inherently racy? How does the locking actually work there? The reason port is for an external/system NMI. All the IPI-NMI don't need to access this register to process their handlers, ie perf. I think in general the IOAPIC is configured to deliver the external NMI to one cpu, usually the bsp cpu. However, there has been a slow movement to free the bsp cpu from exceptions like this to allow one to eventually hot-swap the bsp cpu. The spin locks in that code were an attempt to be more abstract about who really gets the external NMI. Of course SGI's box is setup to deliver an external NMI to all cpus to dump the stack when the system isn't behaving. This is a very low usage NMI (in fact almost all cases lead to loud console messages). Hope that clears up some of the confusion. Hm, not really. What does it mean if two CPUs go down that path? Should one do some NMI processing while the other waits around for it to finish, and then do some NMI processing on its own? Well the time the second one gets to the external NMI it should have been cleared by the first cpu, which would of course lead to the second cpu causing a 'Dazed and confused' message. But on most x86 machines only one cpu should be routed the external NMI. Though there is an SGI box that is designed to send an external NMI to all of its cpus. Is there a way to tell whether an NMI was internally or externally generated? I don't think so, especially as two or more NMIs can be coalesced. So any NMI received on this first cpu has to check the NMI reason port? Well we cheat and execute all the nmi handlers first. If they come back as handled, we skip the check for the external NMI. But you are right, other than checking the reason port, there isn't a way to determine if an NMI is internally or externally generated. But on the other hand, I don't really care if you can say that this path will never be called in a virtual machine. Does virtual machines support hot remove of cpus? Probably not considering bare-metal barely supports it. They do. But vcpus probably don't have the notion of a bsp cpu, so perhaps virtual machines can get away with it easier? (I don't know enough about the hot cpu remove code to really explain it, just enough to know it can cause problems and people are trying to address it). Cheers, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Wed, Sep 07, 2011 at 07:25:24PM +0300, Avi Kivity wrote: On 09/07/2011 06:56 PM, Don Zickus wrote: And hope that no other NMI was generated while we're handling this one. It's a little... fragile? No. If another NMI is generated while we are processing the current one it should get latched. Upon completion of the current one, the cpu should jump right back into the nmi exception routine again. The only downside is when multiple NMIs come in during the processing of the current one. Only one can be latched, so the others get dropped. Ah, yes, I remember now. But we are addressing that. May I ask how? Detecting a back-to-back NMI? Pretty boring actually. Currently we execute an NMI handler until one of them returns handled. Then we stop. This may cause us to miss an NMI in the case of multiple NMIs at once. Now we are changing it to execute _all_ the handlers to make sure we didn't miss one. But then the downside here is we accidentally handle an NMI that was latched. This would cause a 'Dazed on confused' message as that NMI was already handled by the previous NMI. We are working on an algorithm to detect this condition and flag it (nothing complicated). But it may never be perfect. On the other hand, what else are we going to do with an edge-triggered shared interrupt line? Cheers, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/07/2011 07:52 PM, Don Zickus wrote: May I ask how? Detecting a back-to-back NMI? Pretty boring actually. Currently we execute an NMI handler until one of them returns handled. Then we stop. This may cause us to miss an NMI in the case of multiple NMIs at once. Now we are changing it to execute _all_ the handlers to make sure we didn't miss one. That's going to be pretty bad for kvm - those handlers become a lot more expensive since they involve reading MSRs. Even worse if we start using NMIs as a wakeup for pv spinlocks as provided by this patchset. But then the downside here is we accidentally handle an NMI that was latched. This would cause a 'Dazed on confused' message as that NMI was already handled by the previous NMI. We are working on an algorithm to detect this condition and flag it (nothing complicated). But it may never be perfect. On the other hand, what else are we going to do with an edge-triggered shared interrupt line? How about, during NMI, save %rip to a per-cpu variable. Handle just one cause. If, on the next NMI, we hit the same %rip, assume back-to-back NMI has occured and now handle all causes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/07/2011 10:09 AM, Avi Kivity wrote: On 09/07/2011 07:52 PM, Don Zickus wrote: May I ask how? Detecting a back-to-back NMI? Pretty boring actually. Currently we execute an NMI handler until one of them returns handled. Then we stop. This may cause us to miss an NMI in the case of multiple NMIs at once. Now we are changing it to execute _all_ the handlers to make sure we didn't miss one. That's going to be pretty bad for kvm - those handlers become a lot more expensive since they involve reading MSRs. How often are you going to get NMIs in a kvm guest? Even worse if we start using NMIs as a wakeup for pv spinlocks as provided by this patchset. Hm, I'm interested to know what you're thinking in more detail. Can you leave an NMI pending before you block in the same way you can with sti;halt with normal interrupts? I was thinking you might want to do something with monitor/mwait to implement the blocking/kick ops. (Handwave) J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Wed, Sep 07, 2011 at 08:09:37PM +0300, Avi Kivity wrote: On 09/07/2011 07:52 PM, Don Zickus wrote: May I ask how? Detecting a back-to-back NMI? Pretty boring actually. Currently we execute an NMI handler until one of them returns handled. Then we stop. This may cause us to miss an NMI in the case of multiple NMIs at once. Now we are changing it to execute _all_ the handlers to make sure we didn't miss one. That's going to be pretty bad for kvm - those handlers become a lot more expensive since they involve reading MSRs. Even worse if we start using NMIs as a wakeup for pv spinlocks as provided by this patchset. Oh. But then the downside here is we accidentally handle an NMI that was latched. This would cause a 'Dazed on confused' message as that NMI was already handled by the previous NMI. We are working on an algorithm to detect this condition and flag it (nothing complicated). But it may never be perfect. On the other hand, what else are we going to do with an edge-triggered shared interrupt line? How about, during NMI, save %rip to a per-cpu variable. Handle just one cause. If, on the next NMI, we hit the same %rip, assume back-to-back NMI has occured and now handle all causes. I had a similar idea a couple of months ago while debugging a continuous flow of back-to-back NMIs from a stress-test perf application and I couldn't get it to work. But let me try it again, because it does make sense as an optimization. Thanks, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/07/2011 08:17 PM, Jeremy Fitzhardinge wrote: On 09/07/2011 10:09 AM, Avi Kivity wrote: On 09/07/2011 07:52 PM, Don Zickus wrote: May I ask how? Detecting a back-to-back NMI? Pretty boring actually. Currently we execute an NMI handler until one of them returns handled. Then we stop. This may cause us to miss an NMI in the case of multiple NMIs at once. Now we are changing it to execute _all_ the handlers to make sure we didn't miss one. That's going to be pretty bad for kvm - those handlers become a lot more expensive since they involve reading MSRs. How often are you going to get NMIs in a kvm guest? We'll soon have the perf-based watchdog firing every 60s worth of instructions or so. But if we implement your new kick pvop using NMI then it can be _very_ often. Even worse if we start using NMIs as a wakeup for pv spinlocks as provided by this patchset. Hm, I'm interested to know what you're thinking in more detail. Can you leave an NMI pending before you block in the same way you can with sti;halt with normal interrupts? Nope. But you can do if (regs-rip in critical section) regs-rip = after_halt; and effectively emulate it. The critical section is something like critical_section_start: if (woken_up) goto critical_section_end; hlt critical_section_end: I was thinking you might want to do something with monitor/mwait to implement the blocking/kick ops. (Handwave) monitor/mwait are incredibly expensive to virtualize since they require write-protecting a page, IPIs flying everywhere and flushing tlbs, not to mention my lovely hugepages being broken up mercilessly. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/07/2011 08:21 PM, Don Zickus wrote: How about, during NMI, save %rip to a per-cpu variable. Handle just one cause. If, on the next NMI, we hit the same %rip, assume back-to-back NMI has occured and now handle all causes. I had a similar idea a couple of months ago while debugging a continuous flow of back-to-back NMIs from a stress-test perf application and I couldn't get it to work. But let me try it again, because it does make sense as an optimization. Great, thanks. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/07/2011 10:41 AM, Avi Kivity wrote: Hm, I'm interested to know what you're thinking in more detail. Can you leave an NMI pending before you block in the same way you can with sti;halt with normal interrupts? Nope. But you can do if (regs-rip in critical section) regs-rip = after_halt; and effectively emulate it. The critical section is something like critical_section_start: if (woken_up) goto critical_section_end; hlt critical_section_end: Hm. It's a pity you have to deliver an actual interrupt to implement the kick though. I was thinking you might want to do something with monitor/mwait to implement the blocking/kick ops. (Handwave) monitor/mwait are incredibly expensive to virtualize since they require write-protecting a page, IPIs flying everywhere and flushing tlbs, not to mention my lovely hugepages being broken up mercilessly. Or what about a futex-like hypercall? J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Fri, Sep 02, 2011 at 02:50:53PM -0700, Jeremy Fitzhardinge wrote: On 09/02/2011 01:47 PM, Peter Zijlstra wrote: On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote: I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. Which one? arch/x86/kernel/traps.c:nmi_reason_lock It serializes NMI access to the NMI reason port across CPUs. Ah, OK. Well, that will never happen in a PV Xen guest. But PV ticketlocks are equally applicable to an HVM Xen domain (and KVM guest), so I guess there's at least some chance there could be a virtual emulated NMI. Maybe? Does qemu do that kind of thing? But, erm, does that even make sense? I'm assuming the NMI reason port tells the CPU why it got an NMI. If multiple CPUs can get NMIs and there's only a single reason port, then doesn't that mean that either 1) they all got the NMI for the same reason, or 2) having a single port is inherently racy? How does the locking actually work there? The reason port is for an external/system NMI. All the IPI-NMI don't need to access this register to process their handlers, ie perf. I think in general the IOAPIC is configured to deliver the external NMI to one cpu, usually the bsp cpu. However, there has been a slow movement to free the bsp cpu from exceptions like this to allow one to eventually hot-swap the bsp cpu. The spin locks in that code were an attempt to be more abstract about who really gets the external NMI. Of course SGI's box is setup to deliver an external NMI to all cpus to dump the stack when the system isn't behaving. This is a very low usage NMI (in fact almost all cases lead to loud console messages). Hope that clears up some of the confusion. Cheers, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/06/2011 08:14 AM, Don Zickus wrote: On Fri, Sep 02, 2011 at 02:50:53PM -0700, Jeremy Fitzhardinge wrote: On 09/02/2011 01:47 PM, Peter Zijlstra wrote: On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote: I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. Which one? arch/x86/kernel/traps.c:nmi_reason_lock It serializes NMI access to the NMI reason port across CPUs. Ah, OK. Well, that will never happen in a PV Xen guest. But PV ticketlocks are equally applicable to an HVM Xen domain (and KVM guest), so I guess there's at least some chance there could be a virtual emulated NMI. Maybe? Does qemu do that kind of thing? But, erm, does that even make sense? I'm assuming the NMI reason port tells the CPU why it got an NMI. If multiple CPUs can get NMIs and there's only a single reason port, then doesn't that mean that either 1) they all got the NMI for the same reason, or 2) having a single port is inherently racy? How does the locking actually work there? The reason port is for an external/system NMI. All the IPI-NMI don't need to access this register to process their handlers, ie perf. I think in general the IOAPIC is configured to deliver the external NMI to one cpu, usually the bsp cpu. However, there has been a slow movement to free the bsp cpu from exceptions like this to allow one to eventually hot-swap the bsp cpu. The spin locks in that code were an attempt to be more abstract about who really gets the external NMI. Of course SGI's box is setup to deliver an external NMI to all cpus to dump the stack when the system isn't behaving. This is a very low usage NMI (in fact almost all cases lead to loud console messages). Hope that clears up some of the confusion. Hm, not really. What does it mean if two CPUs go down that path? Should one do some NMI processing while the other waits around for it to finish, and then do some NMI processing on its own? It sounds like that could only happen if you reroute NMI from one CPU to another while the first CPU is actually in the middle of processing an NMI - in which case, shouldn't the code doing the re-routing be taking the spinlock? Or perhaps a spinlock isn't the right primitive to use at all? Couldn't the second CPU just set a flag/counter (using something like an atomic add/cmpxchg/etc) to make the first CPU process the second NMI? But on the other hand, I don't really care if you can say that this path will never be called in a virtual machine. J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Tue, Sep 06, 2011 at 11:07:26AM -0700, Jeremy Fitzhardinge wrote: But, erm, does that even make sense? I'm assuming the NMI reason port tells the CPU why it got an NMI. If multiple CPUs can get NMIs and there's only a single reason port, then doesn't that mean that either 1) they all got the NMI for the same reason, or 2) having a single port is inherently racy? How does the locking actually work there? The reason port is for an external/system NMI. All the IPI-NMI don't need to access this register to process their handlers, ie perf. I think in general the IOAPIC is configured to deliver the external NMI to one cpu, usually the bsp cpu. However, there has been a slow movement to free the bsp cpu from exceptions like this to allow one to eventually hot-swap the bsp cpu. The spin locks in that code were an attempt to be more abstract about who really gets the external NMI. Of course SGI's box is setup to deliver an external NMI to all cpus to dump the stack when the system isn't behaving. This is a very low usage NMI (in fact almost all cases lead to loud console messages). Hope that clears up some of the confusion. Hm, not really. What does it mean if two CPUs go down that path? Should one do some NMI processing while the other waits around for it to finish, and then do some NMI processing on its own? Well the time the second one gets to the external NMI it should have been cleared by the first cpu, which would of course lead to the second cpu causing a 'Dazed and confused' message. But on most x86 machines only one cpu should be routed the external NMI. Though there is an SGI box that is designed to send an external NMI to all of its cpus. It sounds like that could only happen if you reroute NMI from one CPU to another while the first CPU is actually in the middle of processing an NMI - in which case, shouldn't the code doing the re-routing be taking the spinlock? Perhaps, but like I said it is a slow transition because most people don't have the hardware to test this (nor does it work due to other limitations). Or perhaps a spinlock isn't the right primitive to use at all? Couldn't the second CPU just set a flag/counter (using something like an atomic add/cmpxchg/etc) to make the first CPU process the second NMI? Might be a smarter approach. Like I said it is hard to test without functioning hardware. :-( But on the other hand, I don't really care if you can say that this path will never be called in a virtual machine. Does virtual machines support hot remove of cpus? Probably not considering bare-metal barely supports it. Cheers, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/06/2011 11:27 AM, Don Zickus wrote: But on the other hand, I don't really care if you can say that this path will never be called in a virtual machine. Does virtual machines support hot remove of cpus? Probably not considering bare-metal barely supports it. The only reason you'd want to is to add/remove VCPUs as a mechanism of resource control, so if you were removing a VCPU it wouldn't matter much which one you choose. In other words, there's no reason you'd ever need to remove the BSP in favour of one of the other CPUs. Anyway, I'm not going to lose any sleep over this issue. J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/06/2011 09:27 PM, Don Zickus wrote: On Tue, Sep 06, 2011 at 11:07:26AM -0700, Jeremy Fitzhardinge wrote: But, erm, does that even make sense? I'm assuming the NMI reason port tells the CPU why it got an NMI. If multiple CPUs can get NMIs and there's only a single reason port, then doesn't that mean that either 1) they all got the NMI for the same reason, or 2) having a single port is inherently racy? How does the locking actually work there? The reason port is for an external/system NMI. All the IPI-NMI don't need to access this register to process their handlers, ie perf. I think in general the IOAPIC is configured to deliver the external NMI to one cpu, usually the bsp cpu. However, there has been a slow movement to free the bsp cpu from exceptions like this to allow one to eventually hot-swap the bsp cpu. The spin locks in that code were an attempt to be more abstract about who really gets the external NMI. Of course SGI's box is setup to deliver an external NMI to all cpus to dump the stack when the system isn't behaving. This is a very low usage NMI (in fact almost all cases lead to loud console messages). Hope that clears up some of the confusion. Hm, not really. What does it mean if two CPUs go down that path? Should one do some NMI processing while the other waits around for it to finish, and then do some NMI processing on its own? Well the time the second one gets to the external NMI it should have been cleared by the first cpu, which would of course lead to the second cpu causing a 'Dazed and confused' message. But on most x86 machines only one cpu should be routed the external NMI. Though there is an SGI box that is designed to send an external NMI to all of its cpus. Is there a way to tell whether an NMI was internally or externally generated? I don't think so, especially as two or more NMIs can be coalesced. So any NMI received on this first cpu has to check the NMI reason port? But on the other hand, I don't really care if you can say that this path will never be called in a virtual machine. Does virtual machines support hot remove of cpus? Probably not considering bare-metal barely supports it. They do. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
CC'ing Keir. On Fri, 2 Sep 2011, Jeremy Fitzhardinge wrote: On 09/02/2011 01:47 PM, Peter Zijlstra wrote: On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote: I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. Which one? arch/x86/kernel/traps.c:nmi_reason_lock It serializes NMI access to the NMI reason port across CPUs. Ah, OK. Well, that will never happen in a PV Xen guest. But PV ticketlocks are equally applicable to an HVM Xen domain (and KVM guest), so I guess there's at least some chance there could be a virtual emulated NMI. Maybe? Does qemu do that kind of thing? Xen knows how to inject NMIs to HVM guests, even though I am not sure if it is actually done in practice at the moment. However digging into the implementation details, it looks like virtual NMIs are not injected if blocking-by-STI (or blocking-by-MOV-SS), so we should be fine, even though I don't know if you actually want to rely on this: /* * We can only inject an NMI if no blocking by MOV SS (also, depending on * implementation, if no blocking by STI). If pin-based 'virtual NMIs' * control is specified then the NMI-blocking interruptibility flag is * also checked. The 'virtual NMI pending' control (available only in * conjunction with 'virtual NMIs') causes a VM exit when all these checks * succeed. It will exit immediately after VM entry if the checks succeed * at that point. * * Because a processor may or may not check blocking-by-STI when injecting * a virtual NMI, it will be necessary to convert that to block-by-MOV-SS * before specifying the 'virtual NMI pending' control. Otherwise we could * enter an infinite loop where we check blocking-by-STI in software and * thus delay delivery of a virtual NMI, but the processor causes immediate * VM exit because it does not check blocking-by-STI. */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/03/2011 12:50 AM, Jeremy Fitzhardinge wrote: On 09/02/2011 01:47 PM, Peter Zijlstra wrote: On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote: I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. Which one? arch/x86/kernel/traps.c:nmi_reason_lock It serializes NMI access to the NMI reason port across CPUs. Ah, OK. Well, that will never happen in a PV Xen guest. But PV ticketlocks are equally applicable to an HVM Xen domain (and KVM guest), so I guess there's at least some chance there could be a virtual emulated NMI. Maybe? Does qemu do that kind of thing? kvm does. In fact, I want to use 'hlt' for blocking vcpus in the slowpath and an NMI IPI for waking them up. That's not going to work in an NMI, but I guess I can replace the 'hlt' with a 'pause' if we're in an NMI, and just spin there. btw, doing a race-free NMI wakeup is going to be interesting - we'll have to look at %rip and see if is just before our hlt, since there's no magic sti; hlt sequence we can use. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Thu, 2011-09-01 at 17:55 -0700, Jeremy Fitzhardinge wrote: From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com We need to make sure interrupts are disabled while we're relying on the contents of the per-cpu lock_waiting values, otherwise an interrupt handler could come in, try to take some other lock, block, and overwrite our values. Would this make it illegal to take a spinlock from NMI context? I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/02/2011 07:47 AM, Peter Zijlstra wrote: On Thu, 2011-09-01 at 17:55 -0700, Jeremy Fitzhardinge wrote: From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com We need to make sure interrupts are disabled while we're relying on the contents of the per-cpu lock_waiting values, otherwise an interrupt handler could come in, try to take some other lock, block, and overwrite our values. Would this make it illegal to take a spinlock from NMI context? That would be problematic. But a Xen domain wouldn't be getting NMIs - at least not standard x86 ones - so that's moot. I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. Which one? J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote: I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. Which one? arch/x86/kernel/traps.c:nmi_reason_lock It serializes NMI access to the NMI reason port across CPUs. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On 09/02/2011 01:47 PM, Peter Zijlstra wrote: On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote: I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. Which one? arch/x86/kernel/traps.c:nmi_reason_lock It serializes NMI access to the NMI reason port across CPUs. Ah, OK. Well, that will never happen in a PV Xen guest. But PV ticketlocks are equally applicable to an HVM Xen domain (and KVM guest), so I guess there's at least some chance there could be a virtual emulated NMI. Maybe? Does qemu do that kind of thing? But, erm, does that even make sense? I'm assuming the NMI reason port tells the CPU why it got an NMI. If multiple CPUs can get NMIs and there's only a single reason port, then doesn't that mean that either 1) they all got the NMI for the same reason, or 2) having a single port is inherently racy? How does the locking actually work there? J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
On Fri, 2011-09-02 at 14:50 -0700, Jeremy Fitzhardinge wrote: On 09/02/2011 01:47 PM, Peter Zijlstra wrote: On Fri, 2011-09-02 at 12:29 -0700, Jeremy Fitzhardinge wrote: I know that its generally considered bad form, but there's at least one spinlock that's only taken from NMI context and thus hasn't got any deadlock potential. Which one? arch/x86/kernel/traps.c:nmi_reason_lock It serializes NMI access to the NMI reason port across CPUs. Ah, OK. Well, that will never happen in a PV Xen guest. But PV ticketlocks are equally applicable to an HVM Xen domain (and KVM guest), so I guess there's at least some chance there could be a virtual emulated NMI. Maybe? Does qemu do that kind of thing? Afaik qemu/kvm can do the whole NMI thing, yes. But, erm, does that even make sense? I'm assuming the NMI reason port tells the CPU why it got an NMI. If multiple CPUs can get NMIs and there's only a single reason port, then doesn't that mean that either 1) they all got the NMI for the same reason, or 2) having a single port is inherently racy? How does the locking actually work there? I really wouldn't know, the whole NMI thing is a bit of a trainwreck architecturally. Maybe the x86 maintainers or Linus knows more on this aspect of it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
But, erm, does that even make sense? I'm assuming the NMI reason port tells the CPU why it got an NMI. If multiple CPUs can get NMIs and there's only a single reason port, then doesn't that mean that either 1) they all got the NMI for the same reason, or 2) having a single port is inherently racy? How does the locking actually work there? All the code to determine NMI reasons is inherently racy, and each new NMI user makes it worse. -Andi -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/13] xen/pvticketlock: disable interrupts while blocking
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com We need to make sure interrupts are disabled while we're relying on the contents of the per-cpu lock_waiting values, otherwise an interrupt handler could come in, try to take some other lock, block, and overwrite our values. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com --- arch/x86/xen/spinlock.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 4c46144..2ed5d05 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -98,6 +98,7 @@ static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want) struct xen_lock_waiting *w = __get_cpu_var(lock_waiting); int cpu = smp_processor_id(); u64 start; + unsigned long flags; /* If kicker interrupts not initialized yet, just spin */ if (irq == -1) @@ -105,6 +106,10 @@ static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want) start = spin_time_start(); + /* Make sure interrupts are disabled to ensure that these + per-cpu values are not overwritten. */ + local_irq_save(flags); + w-want = want; w-lock = lock; @@ -139,6 +144,9 @@ static void xen_lock_spinning(struct arch_spinlock *lock, unsigned want) out: cpumask_clear_cpu(cpu, waiting_cpus); w-lock = NULL; + + local_irq_restore(flags); + spin_time_accum_blocked(start); } PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning); -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html