On Sunday, July 11, 2010 07:57:48 pm Brice Goglin wrote: > Le 11/07/2010 19:48, Jirka Hladky a écrit : > > Hi all, > > > > I have run into two bugs on PPC64 on 2.6.32 kernel. > > > > Version: > > > > lt-lstopo 1.0.1 > > > > BUG #1: No Socket information in lstopo output: > > > > ./lstopo > > > > Machine (3654MB) + L2 #0 (4096KB) > > > > L1 #0 (64KB) + Core #0 > > > > PU #0 (phys=0) > > > > PU #1 (phys=1) > > > > L1 #1 (64KB) + Core #1 > > > > PU #2 (phys=2) > > > > PU #3 (phys=3) > > > > Fixed in the latest version (tried hwloc-1.1a1r2301.tar.gz) > > <http://www.open-mpi.org/software/hwloc/nightly/trunk/hwloc-1.1a1r2301.ta > > r.gz> > > In 1.0.1, there's a patch that prevents us from showing invalid socket > info on old kernels but it also prevents us from showing valid socket > info on recent kernel. I reverted the commit in trunk (and in the > upcoming 1.0.2). Thanks for shading some light into it!
> > > BUG #2 > > > > On some PPC64, kernel 2.6.32 I have crash when running > > > > $ lstopo a.txt > > > > Segmentation fault (core dumped) > > > > $ gdb /usr/local/bin/lstopo core.8771 > > > > Program terminated with signal 11, Segmentation fault. > > > > #0 0x00000000100060b4 in .merge () > > > > It appears only on some PPC64 boxes. > > > > This issue is also gone with in the latest version (tried > > hwloc-1.1a1r2301.tar.gz) > > <http://www.open-mpi.org/software/hwloc/nightly/trunk/hwloc-1.1a1r2301.ta > > r.gz> > > > > I wonder if you are aware of these problems. let me know if you need > > more details. > > If you do "lstopo a.xml" first, does "lstopo --xml a.xml a.txt" crash as > above? If so, please send a.xml so that I debug this. $./lstopo --version lt-lstopo 1.0.1 $./lstopo --xml /tmp/2010-Jul-10_22h14m_results/2.6.32-44.el6.ppc64_OS- indexing.xml a.txt Segmentation fault (core dumped) xml was generated with lstopo --physical a.xml Output of command: "lstopo --physical -" Machine (4096MB) NUMANode p#0 (2240MB) L1 (64KB) + Core p#0 PU p#0 PU p#1 L1 (64KB) + Core p#2 PU p#2 PU p#3 L1 (64KB) + Core p#4 PU p#4 PU p#5 L1 (64KB) + Core p#6 PU p#6 PU p#7 NUMANode p#1 (1856MB) Note missing socket. I will attach: -xml causing crash (2.6.32-44.el6.ppc64_OS-indexing.xml) -whole run directory (notice that png, pdf, ... are created (no crash) but are empty. Others format are OK (check .fig) ) Please notice that hwloc-distrib is also not working correctly - check CPU_AFFINITY/0008.log for example. -runtest.sh - script used to create the data. Let me know if you need more data. Thanks! Jirka
2.6.32-44.el6.ppc64_OS-indexing.xml
Description: XML document
2010-Jul-10_22h14m_hwloc-results.tar.gz
Description: application/compressed-tar
runtest.sh
Description: application/shellscript