Thank you Brice for your quick reply! We will give BIOS upgrade a try and share our findings with the list.

-Mehmet


On 5/9/16 6:10 PM, Brice Goglin wrote:
Le 09/05/2016 23:58, Mehmet Belgin a écrit :
Greetings!

We've been receiving this error for a while on our 64-core Interlagos
AMD machines:

****************************************************************************

* hwloc has encountered what looks like an error from the operating
system.
*
* Socket (P#2 cpuset 0x0000ffff,0x0) intersects with NUMANode (P#3
cpuset 0x0000ff00,0xff000000) without inclusion!
* Error occurred in topology.c line 940
*
* Please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology
script.
****************************************************************************


I've found some information in the hwloc list archives mentioning this
is due to buggy AMD platform and the impact should be limited to hwloc
missing L3 cache info (thanks Brice). If that's the case and processor
representation is correct then I am sure we can live with this, but I
still wanted to check with the list to confirm that (1) this is really
harmless and (2) are there any known solutions other than upgrading
BIOS/kernel?
Hello

The L3 bug only applies to 12-core Opteron 62xx/63xx, while you have
16-core Opterons. Your L3 locality is correct, but your NUMA locality is
wrong:
$ cat sys/devices/system/node/node*/cpumap
00000000,00ffffff
0000ff00,ff000000
000000ff,00000000
ffff0000,00000000
You should have something like this instead:
00000000,0000ffff
00000000,ffff0000
0000ffff,00000000
ffff0000,00000000

This bug is not harmless since memory buffers have a good chance of
being physically allocated far away from your cores.

This is more likely a BIOS bug. Try upgrading.

Regards
Brice

_______________________________________________
hwloc-users mailing list
hwloc-us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
Link to this post: 
http://www.open-mpi.org/community/lists/hwloc-users/2016/05/1274.php

--
=========================================
Mehmet Belgin, Ph.D. (mehmet.bel...@oit.gatech.edu)
Scientific Computing Consultant | OIT - Academic and Research Technologies
Georgia Institute of Technology
258 4th Str NW, Rich Building, Room 326
Atlanta, GA  30332-0700
Office: (404) 385-0665

Reply via email to