On 20.07.2010, at 21:59, John Baldwin wrote:
>> I started narrowing the revisions down until I
>> found out, that while on r202386 I'm still able to trigger the MCE, r202387
>> seems to solve the problem on CURRENT:
>>
>> http://svn.freebsd.org/viewvc/base?view=revision&revision=202387
>
> Although this change was MFC'd, it was later disabled by default because it
> causes issues on other machines. I think there is a tunable you need to set
> in loader.conf to enable it for 8.1. Attilio (the author of that commit)
> should know which tunable to set.
Might be this one in sys/amd64/amd64/clock.c:
----
static int lapic_allclocks = 1;
TUNABLE_INT("machdep.lapic_allclocks", &lapic_allclocks);
----
The r202387 changes put this into local_apic.c, guess it was moved later on (or
after MFC), and that's why I couldn't find it on 8-stable. And, indeed, this
tunable seems to be gone again in current. Testing with
machdep.lapic_allclocks=0 right now. So far it looks very promising. I'll let
it run overnight.
Another thing though: Today I compared verbose boot output from 8-stable and
the current box. I saw that the ioapic sets up IRQ routing differently on these
two systems although the hardware is the same. This seemed not so interesting
at first, but then I noticed that 8-stable sets up two routes (to lapic0 and
lapic2, or sometimes lapic3) for IRQ58 (mpt0), while current only uses one
route (to lapic0).
I used 'cpuset -c -l 0 -x 58' in an attempt to make my 8-stable box behave like
the one running current. Indeed, this seems to have changed IRQ58 to be routed
to lapic0 only. And the box was running for hours without showing the symptoms.
I just checked boot verbose outpout of my 8-stable box again (booted with
machdep.lapic_allclocks=0 as mentioned above). And now it seems to have set up
IRQ routes just like the current box (one route for IRQ58 to lapic0).
So I don't get which issue came first... If either one is ruled out, the
problem seems to be gone. Was it the clock issue causing wrong IRQ routing
setup which in turn causes mpt or the CPU go nuts? Or is mpt having two
interrupt routes actually a normal thing (then why doesn't current behave this
way?), but the mpt driver causes strange thins when operating with clock
issues? Or have I misinterpreted something?
Here's the boot verbose output of ioapic related to interrupts 56 (em0), 57
(em1) and 58 (mpt0):
---- 1st X4100M2 - running 8-stable (machdep.lapic_allclocks=1, MCEs can be
reproduced easily) ----
# egrep '^ioapic' boot.normal | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 1 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 2 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 3 vector 50
----
---- 1st X4100M2 - running 8-stable (machdep.lapic_allclocks=0, test currently
running, no MCEs so far) ----
# egrep '^ioapic' boot.lapic_allclocks0 | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 2 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 3 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57
----
---- 2nd X4100M2 - running current (MCEs cannot be reproduced) ----
# dmesg | egrep '^ioapic' | egrep 'IRQ 5[678]' | sort
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 0 vector 55
ioapic2: routing intpin 0 (PCI IRQ 56) to lapic 2 vector 50
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 0 vector 56
ioapic2: routing intpin 1 (PCI IRQ 57) to lapic 3 vector 50
ioapic2: routing intpin 2 (PCI IRQ 58) to lapic 0 vector 57
----
Markus
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"