On Fri, 1 May 2015, Ingo Molnar wrote: > So 0000fffffffffffe corresponds to 2 events left until overflow, > right? And on Haswell we don't set x86_pmu.limit_period AFAICS, so we > allow these super short periods. > > Maybe like on Broadwell we need a quirk on Nehalem/Haswell as well, > one similar to bdw_limit_period()? Something like the patch below? > > Totally untested and such. I picked 128 because of Broadwell, but > lower values might work as well. You could try to increase it to 3 and > upwards and see which one stops triggering stuck NMI loops?
I spent a lot of time trying to come up with a test case that triggered this more reliably but failed. It definitely is an issue with PMC0 being -2 causing the PMC0 bit in the status register getting stuck and no clearing. Often there is also a PEBS event active at the same time but that might be coincidence. With your patch applied I can't trigger the issue. I haven't tried narrowing down the exact value yet. Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/