Hi,

I ran into a problem when running OMPI v1.8.1 -- a divide by zero crash deep in 
the hwloc code called by OMPI.  The system I'm running is a simics x86_64 
emulator and RHEL 6.3.  I can reproduce the error running lstopo from hwloc 
v1.9:

[root@viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib ./lstopo -v
Floating point exception (core dumped)


Hwloc v1.1rc6, already installed on the system, and a corresponding OMPI 1.6.5 
build, works with no problems:

[root@viper0 bin]# lstopo --version
lstopo 1.1rc6
[root@viper0 bin]# lstopo -v
Machine (P#0 local=2055580KB total=2055580KB DMIProductName=Bochs 
DMIProductVersion= DMIProductSerial= DMIChassisVendor=Bochs DMIChassisType=1 
DMIChassisVersion= DMIChassisSerial= DMIChassisAssetTag= DMIBIOSVendor=Bochs 
DMIBIOSVersion=Bochs DMIBIOSDate=01/01/2007 DMIS)
  Socket L#0 (P#0)
    L3Cache L#0 (8192KB line=64)
      L2Cache L#0 (256KB line=64)
        L1Cache L#0 (32KB line=64)
          Core L#0 (P#0)
            PU L#0 (P#0)
depth 0:        1 Machine (type #1)
 depth 1:       1 Socket (type #3)
  depth 2:      1 Cache (type #4)
   depth 3:     1 Cache (type #4)
    depth 4:    1 Cache (type #4)
     depth 5:   1 Core (type #5)
      depth 6:  1 PU (type #6)


Here's the output from a GDB session on hwloc v1.9:

[root@viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib gdb ./lstopo
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/hwloc/bin/lstopo...done.
(gdb) r -v
Starting program: /root/hwloc/bin/lstopo -v
warning: no loadable sections found in added symbol-file system-supplied DSO at 
0x7ffff7ffd000

Program received signal SIGFPE, Arithmetic exception.
0x00007ffff7df0558 in look_proc (infos=0x61b6a0, highest_cpuid=11, 
highest_ext_cpuid=<value optimized out>, features=<value optimized out>, 
cpuid_type=intel)
    at topology-x86.c:323
323           infos->threadid = infos->logprocid % infos->max_nbthreads;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64
(gdb) bt
#0  0x00007ffff7df0558 in look_proc (infos=0x61b6a0, highest_cpuid=11, 
highest_ext_cpuid=<value optimized out>, features=<value optimized out>,
    cpuid_type=intel) at topology-x86.c:323
#1  0x00007ffff7df165a in look_procs (topology=0x619100, nbprocs=1, 
fulldiscovery=0) at topology-x86.c:741
#2  hwloc_look_x86 (topology=0x619100, nbprocs=1, fulldiscovery=0) at 
topology-x86.c:886
#3  0x00007ffff7df17f9 in hwloc_x86_discover (backend=<value optimized out>) at 
topology-x86.c:934
#4  0x00007ffff7dd6568 in hwloc_discover (topology=0x619100) at topology.c:2452
#5  hwloc_topology_load (topology=0x619100) at topology.c:2925
#6  0x0000000000402cf0 in main (argc=<value optimized out>, argv=<value 
optimized out>) at lstopo.c:581
 (gdb) print infos->logprocid
$1 = 0
(gdb) print infos->max_nbthreads
$2 = 0


Any ideas?  Any other info I should provide?

Thanks,

Andrew

Reply via email to