On Tue, Mar 09, 2010 at 06:42:02PM -0600, Kevin Day wrote:
>
> On Mar 9, 2010, at 4:27 PM, John Baldwin wrote:
>
> > On Tuesday 09 March 2010 3:40:26 pm Kevin Day wrote:
> >>
> >>
> >> If I boot up on an Opteron 2218 system, it boots normally. If I boot the
> > exact same VM moved to a 2352, I get:
> >>
> >> acpi0: <INTEL 440BX> on motherboard
> >> PCIe: Memory Mapped configuration base @ 0xe0000000
> >> (very long pause)
> >> ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 0 vector 48
> >> acpi0: [MPSAFE]
> >> acpi0: [ITHREAD]
> >>
> >> then booting normally.
> >
> > It's probably worth adding some printfs to narrow down where the pause is
> > happening. This looks to be all during the acpi_attach() routine, so maybe
> > you can start there.
>
> Okay, good pointer. This is what I've narrowed down:
>
> acpi_enable_pcie() calls pcie_cfgregopen(). It's called here with
> pcie_cfgregopen(0xe0000000, 0, 255). inside pcie_cfgregopen, the pause starts
> here:
>
> /* XXX: We should make sure this really fits into the direct map. */
> pcie_base = (vm_offset_t)pmap_mapdev(base, (maxbus + 1) << 20);
>
> pmap_mapdev calls pmap_mapdev_attr, and in there this evaluates to true:
>
> /*
> * If the specified range of physical addresses fits within the direct
> * map window, use the direct map.
> */
> if (pa < dmaplimit && pa + size < dmaplimit) {
>
> so we call pmap_change_attr which called pmap_change_attr_locked. It's
> changing 0x10000000 bytes starting at 0xffffff00e0000000. The very last line
> before returning from pmap_change_attr_locked is:
>
> pmap_invalidate_cache_range(base, tmpva);
>
> And this is where the delay is. This is calling MFENCE/CLFLUSH in a loop 8
> million times. We actually had a problem with CLFLUSH causing panics on these
> same CPUs under Xen, which is partially why we're looking at VMware now. (see
> kern/138863). I'm wondering if VMware didn't encounter the same problem and
> replace CLFLUSH with a software emulated version that is far slower... based
> on the speed is probably invalidating the entire cache. A quick change to
> pmap_invalidate_cache_range to just clear the entire cache if the area being
> cleared is over 8MB seems to have fixed it. i.e.:
>
> else if (cpu_feature & CPUID_CLFSH) {
>
> to
>
> else if ((cpu_feature & CPUID_CLFSH) && ((eva-sva) < (2<<22))) {
>
>
> However, I'm a little blurry on if everything leading to this point is
> correct. It's ending up with 256MB of memory for the pci area, which seems
> really excessive. Is the problem just that it wants room for 256 busses,
> or...? Anyone know this code path well enough to know if this is deviating
> from the norm?I think that the idea not to for CLFLUSH in the loop for large regions is good. We do not extract the L2/L3 cache size now, I suppose that 2MB estimation is good for most situations. commit bbac1632d349d68b905df644656ce9a8e4aed094 Author: Konstantin Belousov <[email protected]> Date: Wed Mar 10 13:07:51 2010 +0200 Fall back to wbinvd when region for CLFLUSH is >= 2MB. Submitted by: Kevin Day <[email protected]> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c index 07db5d1..4361be0 100644 --- a/sys/amd64/amd64/pmap.c +++ b/sys/amd64/amd64/pmap.c @@ -994,7 +994,8 @@ pmap_invalidate_cache_range(vm_offset_t sva, vm_offset_t eva) if (cpu_feature & CPUID_SS) ; /* If "Self Snoop" is supported, do nothing. */ - else if (cpu_feature & CPUID_CLFSH) { + else if ((cpu_feature & CPUID_CLFSH) != 0 && + eva - sva < 2 * 1024 * 1024) { /* * Otherwise, do per-cache line flush. Use the mfence @@ -1011,7 +1012,8 @@ pmap_invalidate_cache_range(vm_offset_t sva, vm_offset_t eva) /* * No targeted cache flush methods are supported by CPU, - * globally invalidate cache as a last resort. + * or the supplied range is bigger then 2MB. + * Globally invalidate cache. */ pmap_invalidate_cache(); } diff --git a/sys/i386/i386/pmap.c b/sys/i386/i386/pmap.c index 4b2e34f..f448071 100644 --- a/sys/i386/i386/pmap.c +++ b/sys/i386/i386/pmap.c @@ -996,7 +996,8 @@ pmap_invalidate_cache_range(vm_offset_t sva, vm_offset_t eva) if (cpu_feature & CPUID_SS) ; /* If "Self Snoop" is supported, do nothing. */ - else if (cpu_feature & CPUID_CLFSH) { + else if ((cpu_feature & CPUID_CLFSH) != 0 && + eva - sva < 2 * 1024 * 1024) { /* * Otherwise, do per-cache line flush. Use the mfence @@ -1013,7 +1014,8 @@ pmap_invalidate_cache_range(vm_offset_t sva, vm_offset_t eva) /* * No targeted cache flush methods are supported by CPU, - * globally invalidate cache as a last resort. + * or the supplied range is bigger then 2MB. + * Globally invalidate cache. */ pmap_invalidate_cache(); }
pgpo01y8142ck.pgp
Description: PGP signature

