Following up on this,
Indeed with a recent kernel the error message goes away.
The poor performance stays though (a few percent difference between 4.13
and 4.15rc5), and I'm at a loss as to whether it's related to MPI or not.
I see oddities such as locking the job to the first 12 cores yield 100%
greater performance than locking to the last 12 cores which I can't explain
but I can only suspect are related to some kind of MPI cache partitioning
issue.


On Sat, Dec 30, 2017 at 8:59 AM, Brice Goglin <brice.gog...@inria.fr> wrote:

>
>
> Le 29/12/2017 à 23:15, Bill Broadley a écrit :
> >
> >
> > Very interesting, I was running parallel finite element code and was
> seeing
> > great performance compared to Intel in most cases, but on larger runs it
> was 20x
> > slower.  This would explain it.
> >
> > Do you know which commit, or anything else that might help find any
> related
> > discussion?  I tried a few google searches without luck.
> >
> > Is it specific to the 24-core?  The slowdown I described happened on a
> 32 core
> > Epyc single socket as well as a dual socket 24 core AMD Epyc system.
>
> Hello
>
> Yes it's 24-core specific (that's the only core-count that doesn't have
> 8-core per zeppelin module).
>
> The commit in Linux git master is 2b83809a5e6d619a780876fcaf68cdc42b50d28c
>
> Brice
>
>
> commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c
> Author: Suravee Suthikulpanit <suravee.suthikulpa...@amd.com>
> Date:   Mon Jul 31 10:51:59 2017 +0200
>
>     x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask
>
>     For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID
>     to calculate shared_cpu_map. However, APIC IDs are not guaranteed to
>     be contiguous for cores across different L3s (e.g. family17h system
>     w/ downcore configuration). This breaks the logic, and results in an
>     incorrect L3 shared_cpu_map.
>
>     Instead, always use the previously calculated cpu_llc_shared_mask of
>     each CPU to derive the L3 shared_cpu_map.
>
> _______________________________________________
> hwloc-users mailing list
> hwloc-users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/hwloc-users
>
_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to