On Nov 27, 2007, at 5:13 PM, Terry Frankcombe wrote:

==20671== Conditional jump or move depends on uninitialised value(s)
==20671==    at 0x40152B1: (within /lib/ld-2.5.so)
==20671==    by 0x400A289: (within /lib/ld-2.5.so)
==20671==    by 0x6A42E4D: (within /lib/libc-2.5.so)
==20671==    by 0x59AE0E3: (within /lib/libdl-2.5.so)
==20671==    by 0x400D725: (within /lib/ld-2.5.so)
==20671==    by 0x59AE4EC: (within /lib/libdl-2.5.so)
==20671==    by 0x59AE099: dlsym (in /lib/libdl-2.5.so)
==20671==    by 0x57610FB: vm_sym
(in /usr/local/lib/libopen-pal.so.0.0.0)
==20671==    by 0x575E29E: lt_dlsym
(in /usr/local/lib/libopen-pal.so.0.0.0)
==20671==    by 0x57666EF: open_component
(in /usr/local/lib/libopen-pal.so.0.0.0)
==20671==    by 0x576711B: mca_base_component_find
(in /usr/local/lib/libopen-pal.so.0.0.0)
==20671==    by 0x5767A9F: mca_base_components_open
(in /usr/local/lib/libopen-pal.so.0.0.0)

This looks particularly broken!

I've just run valgrind on another (serial) piece of code on this machine and got three of the unitialised jumps from within ld-2.5.so, virtually identical to the first three from this MPI code. Of the 24 from the MPI
code, those seeming to originate from within OpenMPI are particularly
worrying.

These are usually false positives -- in my [not comprehensive] experience, they are typically the results of valgrind trying to analyze optimized code where all the debugging information is not available (and therefore it generates false positives). For example, the one snipit above is from a supposedly uninitialized variable in the system call dlsym(). I strongly suspect that this is not a real problem.

As for valgrind not finding your real problem -- bummer. It can't always find everything. :-( Perhaps try electric fence and/or other kinds of "watch" actions to see when exactly variables change (that might give insight into whether a buffer is being overflowed, etc.)...?

--
Jeff Squyres
Cisco Systems

Reply via email to