Le 28/07/2010 16:21, Bernd Kallies a écrit :
> We just got one SGI UltraViolet rack, containing 48 NUMA nodes with one
> Octocore Nehalem each, SMT switched on. Essentially the machine is a big
> shared-memory machine, similar to what SGI had with their Itanium-based
> Altix 4700.
>
> OS is SLES11 (2.6.32.12-0.7.1.1381.1.PTF-default x86_64). I used
> hwloc-1.0.2 compiled with gcc.
>
> The lstopo output looks a bit strange. The full output of lstopo is
> attached. It begins with
>
> Machine (1534GB)
>   Group4 #0 (1022GB)
>     Group3 #0 (510GB)
>       Group2 #0 (254GB)
>         Group1 #0 (126GB)
>           Group0 #0 (62GB)
>             NUMANode #0 (phys=0 30GB) + Socket #0 + L3 #0 (24MB)
>               L2 #0 (256KB) + L1 #0 (32KB) + Core #0
>                 PU #0 (phys=0)
>                 PU #1 (phys=384)
>               L2 #1 (256KB) + L1 #1 (32KB) + Core #1
>                 PU #2 (phys=1)
>                 PU #3 (phys=385)
>               L2 #2 (256KB) + L1 #2 (32KB) + Core #2
> ...
>             NUMANode #1 (phys=1 32GB) + Socket #1 + L3 #1 (24MB)
>               L2 #8 (256KB) + L1 #8 (32KB) + Core #8
>                 PU #16 (phys=8)
>                 PU #17 (phys=392)
>               L2 #9 (256KB) + L1 #9 (32KB) + Core #9
> ...
>
> The output essentially says that there are 48 NUMA nodes with 8 cores
> each. Each NUMA node contains 32 GB memory except the 1st one, which
> contains 30 GB. Two NUMA nodes are grouped together as "Group0". Two
> "Group0" are grouped together as "Group1" and so on. There are three
> "Group3" objects, the 1st one contains 16 NUMA nodes with 510 GB, the
> remaining two contain 16 NUMA nodes with 512 GB each. Up to here the
> topology is understandeable. I'm wondering about "Group4", which
> contains the three "Group3" objects. lstopo should print "1534GB"
> instead of "1022GB". There is only one "Group4" object, and there are no
> other direct children of the root object.
>   

Indeed, there's something wrong.
Can you send the output of tests/linux/gather_topology.sh so that I try
to debug this from here?

> Moreover, when running applications that use the hwloc API, and call
> functions like hwloc_get_next_obj_by_depth or hwloc_get_obj_by_depth,
> then calling hwloc_topology_destroy or even free() on some
> self-allocated memory, then the app fail at this stage with
>
> *** glibc detected *** a.out: double free or corruption (out).
> or
> *** glibc detected *** a.out: free(): invalid next size (fast):
>   

Can you send an example as well?

thanks,
Brice

Reply via email to