On Fri, 1 May 2015, Ingo Molnar wrote:

> So 0000fffffffffffe corresponds to 2 events left until overflow, 
> right? And on Haswell we don't set x86_pmu.limit_period AFAICS, so we 
> allow these super short periods.
> 
> Maybe like on Broadwell we need a quirk on Nehalem/Haswell as well, 
> one similar to bdw_limit_period()? Something like the patch below?
> 
> Totally untested and such. I picked 128 because of Broadwell, but 
> lower values might work as well. You could try to increase it to 3 and 
> upwards and see which one stops triggering stuck NMI loops?

I spent a lot of time trying to come up with a test case that triggered 
this more reliably but failed.

It definitely is an issue with PMC0 being -2 causing the PMC0 bit in the 
status register getting stuck and no clearing.  Often there is also a PEBS 
event active at the same time but that might be coincidence.

With your patch applied I can't trigger the issue. I haven't tried 
narrowing down the exact value yet.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to