Le 09/05/2016 23:58, Mehmet Belgin a écrit :
> Greetings!
>
> We've been receiving this error for a while on our 64-core Interlagos
> AMD machines:
>
> ****************************************************************************
>
> * hwloc has encountered what looks like an error from the operating
> system.
> *
> * Socket (P#2 cpuset 0x0000ffff,0x0) intersects with NUMANode (P#3
> cpuset 0x0000ff00,0xff000000) without inclusion!
> * Error occurred in topology.c line 940
> *
> * Please report this error message to the hwloc user's mailing list,
> * along with the output+tarball generated by the hwloc-gather-topology
> script.
> ****************************************************************************
>
>
> I've found some information in the hwloc list archives mentioning this
> is due to buggy AMD platform and the impact should be limited to hwloc
> missing L3 cache info (thanks Brice). If that's the case and processor
> representation is correct then I am sure we can live with this, but I
> still wanted to check with the list to confirm that (1) this is really
> harmless and (2) are there any known solutions other than upgrading
> BIOS/kernel?

Hello

The L3 bug only applies to 12-core Opteron 62xx/63xx, while you have
16-core Opterons. Your L3 locality is correct, but your NUMA locality is
wrong:
$ cat sys/devices/system/node/node*/cpumap             
00000000,00ffffff
0000ff00,ff000000
000000ff,00000000
ffff0000,00000000
You should have something like this instead:
00000000,0000ffff
00000000,ffff0000
0000ffff,00000000
ffff0000,00000000

This bug is not harmless since memory buffers have a good chance of
being physically allocated far away from your cores.

This is more likely a BIOS bug. Try upgrading.

Regards
Brice

Reply via email to