I mentioned this to Adrian, but I'll mention here for everyone else's benefit.

Ryan is exactly right.  There was a thread a while ago, with a proposed patch 
from Kostik:


As I recall, Scott Long also ran into this a few months ago.

It happens for any NMI:  entering the debugger, a PCI Parity or System Error, a 
hardware watchdog timeout, and probably other sources I'm not remembering.


On 08/21/2015 09:23, Ryan Stone wrote:
> I have seen similar behaviour before.  The problem is that every CPU
> receives an NMI concurrently.  As I recall, one of them gets some kind of
> pseudo-spinlock and tries to stop the other CPUs with an NMI.  However,
> because they are already in an NMI handler, they don't get the second NMI
> and don't stop properly.
> The case that I saw actually had to do with a panic triggered by an NMI,
> not entering the debugger, but I believe that both cases use
> stop_cpus_hard() under the hood and have a similar issue.
> (I also recall seeing the exact situation that you describe while
> originally developing SR-IOV on an alpha version of the Fortville hardware
> and firmware with a very buggy SR-IOV implementation.  I've never seen it
> on ixgbe before, although I haven't used SR-IOV there very much at all)
> On Thu, Aug 20, 2015 at 6:15 PM, Adrian Chadd <adr...@freebsd.org> wrote:
>> Hi!
>> This has started happening on -HEAD recently. No, I don't have any
>> more details yet than "recently."
>> Whenever I get an NMI panic (and getting an NMI is a separate issue,
>> sigh) I get a slew of "failed to stop cpu" messages, and all CPUs
>> enter ddb. This is .. sub-optimal. Has anyone seen this? Does anyone
>> have any ideas?
>> -adrian

freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to