Le 29/12/2017 à 23:15, Bill Broadley a écrit :
>
>
> Very interesting, I was running parallel finite element code and was seeing
> great performance compared to Intel in most cases, but on larger runs it was 
> 20x
> slower.  This would explain it.
>
> Do you know which commit, or anything else that might help find any related
> discussion?  I tried a few google searches without luck.
>
> Is it specific to the 24-core?  The slowdown I described happened on a 32 core
> Epyc single socket as well as a dual socket 24 core AMD Epyc system.

Hello

Yes it's 24-core specific (that's the only core-count that doesn't have
8-core per zeppelin module).

The commit in Linux git master is 2b83809a5e6d619a780876fcaf68cdc42b50d28c

Brice


commit 2b83809a5e6d619a780876fcaf68cdc42b50d28c
Author: Suravee Suthikulpanit <suravee.suthikulpa...@amd.com>
Date:   Mon Jul 31 10:51:59 2017 +0200

    x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask
    
    For systems with X86_FEATURE_TOPOEXT, current logic uses the APIC ID
    to calculate shared_cpu_map. However, APIC IDs are not guaranteed to
    be contiguous for cores across different L3s (e.g. family17h system
    w/ downcore configuration). This breaks the logic, and results in an
    incorrect L3 shared_cpu_map.
    
    Instead, always use the previously calculated cpu_llc_shared_mask of
    each CPU to derive the L3 shared_cpu_map.

_______________________________________________
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Reply via email to