On Nov 27, 2007, at 5:13 PM, Terry Frankcombe wrote:
==20671== Conditional jump or move depends on uninitialised value(s)
==20671== at 0x40152B1: (within /lib/ld-2.5.so)
==20671== by 0x400A289: (within /lib/ld-2.5.so)
==20671== by 0x6A42E4D: (within /lib/libc-2.5.so)
==20671== by 0x59AE0E3: (within /lib/libdl-2.5.so)
==20671== by 0x400D725: (within /lib/ld-2.5.so)
==20671== by 0x59AE4EC: (within /lib/libdl-2.5.so)
==20671== by 0x59AE099: dlsym (in /lib/libdl-2.5.so)
==20671== by 0x57610FB: vm_sym
(in /usr/local/lib/libopen-pal.so.0.0.0)
==20671== by 0x575E29E: lt_dlsym
(in /usr/local/lib/libopen-pal.so.0.0.0)
==20671== by 0x57666EF: open_component
(in /usr/local/lib/libopen-pal.so.0.0.0)
==20671== by 0x576711B: mca_base_component_find
(in /usr/local/lib/libopen-pal.so.0.0.0)
==20671== by 0x5767A9F: mca_base_components_open
(in /usr/local/lib/libopen-pal.so.0.0.0)
This looks particularly broken!
I've just run valgrind on another (serial) piece of code on this
machine
and got three of the unitialised jumps from within ld-2.5.so,
virtually
identical to the first three from this MPI code. Of the 24 from the
MPI
code, those seeming to originate from within OpenMPI are particularly
worrying.
These are usually false positives -- in my [not comprehensive]
experience, they are typically the results of valgrind trying to
analyze optimized code where all the debugging information is not
available (and therefore it generates false positives). For example,
the one snipit above is from a supposedly uninitialized variable in
the system call dlsym(). I strongly suspect that this is not a real
problem.
As for valgrind not finding your real problem -- bummer. It can't
always find everything. :-( Perhaps try electric fence and/or other
kinds of "watch" actions to see when exactly variables change (that
might give insight into whether a buffer is being overflowed, etc.)...?
--
Jeff Squyres
Cisco Systems