Following up on this, Indeed with a recent kernel the error message goes away. The poor performance stays though (a few percent difference between 4.13 and 4.15rc5), and I'm at a loss as to whether it's related to MPI or not. I see oddities such as locking the job to the first 12 cores yield 100% greater performance than locking to the last 12 cores which I can't explain but I can only suspect are related to some kind of MPI cache partitioning issue.
On Sat, Dec 30, 2017 at 8:59 AM, Brice Goglin <brice.gog...@inria.fr> wrote: > > > Le 29/12/2017 à 23:15, Bill Broadley a écrit : > > > > > > Very interesting, I was running parallel finite element code and was > seeing > > great performance compared to Intel in most cases, but on larger runs it > was 20x > > slower. This would explain it. > > > > Do you know which commit, or anything else that might help find any > related > > discussion? I tried a few google searches without luck. > > > > Is it specific to the 24-core? The slowdown I described happened on a > 32 core > > Epyc single socket as well as a dual socket 24 core AMD Epyc system. > > Hello > > Yes it's 24-core specific (that's the only core-count that doesn't have > 8-core per zeppelin module). > > The commit in Linux git master is 2b83809a5e6d619a780876fcaf68cdc42b50d28c > > Brice > > > commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c > Author: Suravee Suthikulpanit <suravee.suthikulpa...@amd.com> > Date: Mon Jul 31 10:51:59 2017 +0200 > > x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask > > For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID > to calculate shared_cpu_map. However, APIC IDs are not guaranteed to > be contiguous for cores across different L3s (e.g. family17h system > w/ downcore configuration). This breaks the logic, and results in an > incorrect L3 shared_cpu_map. > > Instead, always use the previously calculated cpu_llc_shared_mask of > each CPU to derive the L3 shared_cpu_map. > > _______________________________________________ > hwloc-users mailing list > hwloc-users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/hwloc-users >
_______________________________________________ hwloc-users mailing list hwloc-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/hwloc-users