Le 04/11/2013 11:44, Paul Kapinos a écrit : > Hello all, > I. > sorry for this paleontologic excursion. (The 4 years old 'lstopo' > binary was just in my private bin folder and still being runnable..) > > Attached output of newer version 1.5 (Linux-Default one on RHEL/6.4 > (SL/6.4). > > II. > I've also tested hwloc-1.5.2 (could not find v.1.5.3) and hwloc-1.7.2 > as Brice suggested, by 'confugure' + 'make test' - logs attached. > > 1.5.2 fails: > >/bin/sh: line 5: 20677 Segmentation fault (core dumped) ${dir}$tst > >FAIL: xmlbuffer
Can you give more details about this segfault? Try (from the build tree): $ libtool --mode=execute gdb xmlbuffer then type 'run' when it crashes, type 'bt full' and send the output. Then please also run from hwloc 1.5.2: * "lstopo foo.xml" and send "foo.xml" * "hwloc-gather-topology foo" and send "foo.tar.bz2" > whereby 1.7.2 seem to be OK. > > AFAIK in OpenMPI 1.7.4 the version of 'hwlock' has to be updated? > If so, the original issue should be fixed by this, huh? Hard to say before we get details about the crash in xmlbuffer above. Brice > > Many thanks for your help! > Best > > Paul > > pk224850@linuxitvc00:~/SVN/mpifasttest/trunk[511]lstopo 1.5 > $ lstopo lstopo_linuxitvc00_1.5.txt > $ lstopo lstopo_linuxitvc00_1.5.xml > > > > > > On 11/01/13 15:37, Brice Goglin wrote: >> Sorry, I missed the mail on OMPI-users. >> >> This hwloc looks veeeeeeeeeeeery old. We don't have Misc objects >> instead of >> Groups since we switched from 0.9 to 1.0. You should regenerate the >> XML file >> with a hwloc version that came out after the big bang (or better, >> after the >> asteroid killed the dinosaurs). Please resend that XML from a recent >> hwloc so >> that we can get a better clue of the problem. >> >> Assuming there's a bug in OMPI's hwloc, I would suggests downloading >> hwloc 1.5.3 >> and running make check on that machine. And try again with hwloc >> 1.7.2 in case >> that's already fixed. >> >> thanks >> Brice >> >> >> >> >> >> >> Le 01/11/2013 15:24, Jeff Squyres (jsquyres) a écrit : >>> Paul Kapinos originally reported this issue on the OMPI users list. >>> >>> He is showing a stack trace from OMPI-1.7.3, which uses hwloc 1.5.2 >>> (note that >>> OMPI 1.7.4 will use hwloc 1.7.2). >>> >>> I tried to read the xml file he provided with the git hwloc master >>> HEAD, and >>> it fails: >>> >>> ----- >>> ❯❯❯ ./utils/lstopo -i lstopo_linuxitvc00.xml >>> ignoring depth attribute for object type without depth >>> ignoring depth attribute for object type without depth >>> XML component discovery failed. >>> hwloc_topology_load() failed (Invalid argument). >>> ----- >>> >>> Any idea what's happening here? >>> >>> BTW, I can apply the fix to both the OMPI SVN trunk and v1.7 branch >>> (since >>> OMPI v1.7 is now up to hwloc 1.7.2). >>> >>> >>> >>> On Oct 31, 2013, at 1:28 PM, Paul Kapinos >>> <kapi...@rz.rwth-aachen.de> wrote: >>> >>> > Hello all, >>> > >>> > using 1.7.x (1.7.2 and 1.7.3 tested), we get SIGSEGV from somewhere >>> in-deepth of 'hwlock' library - see the attached screenshot. >>> > >>> > Because the error is strongly aligned to just one single node, >>> which in turn >>> is kinda special one (see output of 'lstopo -'), it smells like an >>> error in >>> the 'hwlock' library. >>> > >>> > Is there a way to disable hwlock or to debug it in somehow way? >>> > (besides to build a debug version of hwlock and OpenMPI) >>> > >>> > Best >>> > >>> > Paul >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > -- >>> > Dipl.-Inform. Paul Kapinos - High Performance Computing, >>> > RWTH Aachen University, Center for Computing and Communication >>> > Seffenter Weg 23, D 52074 Aachen (Germany) >>> > Tel: +49 241/80-24915 >>> > >>> <lstopo_linuxitvc00.txt><opal_hwlock_SIGSEGV.png><lstopo_linuxitvc00.xml>_______________________________________________ >>> >>> > users mailing list >>> > us...@open-mpi.org >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> <http://www.cisco.com/web/about/doing_business/legal/cri/> >>> >>> >>> >>> _______________________________________________ >>> hwloc-users mailing list >>> hwloc-us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >> > >