On Dec 16, 2005, at 10:47 AM, Greg Watson wrote:

I finally worked out why I couldn't reproduce the problem. You're not
going to like it though.

You're right -- this kind of buglet is among the most un-fun.  :-(

Here's the stacktracefrom the core file:

#0  0x00e93fe8 in orte_pls_rsh_launch ()
    from /usr/local/ompi/lib/openmpi/mca_pls_rsh.so
#1  0x0023c642 in orte_rmgr_urm_spawn ()
    from /usr/local/ompi/lib/openmpi/mca_rmgr_urm.so
#2  0x0804a0d4 in orterun (argc=5, argv=0xbfab2e84) at orterun.c:373
#3  0x08049b16 in main (argc=5, argv=0xbfab2e84) at main.c:13

Can you recompile this one file with -g? Specifically, cd into the orte/mca/pla/rsh dir and "make clean". Then "make". Then cut-n- paste the compile line for that one file to a shell prompt, and put in a -g.

Then either re-install that component (it looks like you're doing a dynamic build with separate components, so you can do "make install" right from the rsh dir) or re-link liborte and re-install that and re- run. The corefile might give something a little more meaningful in this case...?

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Reply via email to