On Dec 16, 2005, at 10:47 AM, Greg Watson wrote:
I finally worked out why I couldn't reproduce the problem. You're not going to like it though.
You're right -- this kind of buglet is among the most un-fun. :-(
Here's the stacktracefrom the core file: #0 0x00e93fe8 in orte_pls_rsh_launch () from /usr/local/ompi/lib/openmpi/mca_pls_rsh.so #1 0x0023c642 in orte_rmgr_urm_spawn () from /usr/local/ompi/lib/openmpi/mca_rmgr_urm.so #2 0x0804a0d4 in orterun (argc=5, argv=0xbfab2e84) at orterun.c:373 #3 0x08049b16 in main (argc=5, argv=0xbfab2e84) at main.c:13
Can you recompile this one file with -g? Specifically, cd into the orte/mca/pla/rsh dir and "make clean". Then "make". Then cut-n- paste the compile line for that one file to a shell prompt, and put in a -g.
Then either re-install that component (it looks like you're doing a dynamic build with separate components, so you can do "make install" right from the rsh dir) or re-link liborte and re-install that and re- run. The corefile might give something a little more meaningful in this case...?
-- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/