Hello all,
I recently checked in code for xcpu, so that now xpcu can be used as one of the launchers within open-mpi.
It works fine but I am having one problem.
In file trunk/orte/tools/orterun/totalview.c, on line 402,
I am getting proc->proc_node as NULL which is causing mpirun to crash. If I change line 402 from

MPIR_proctable[i].host_name = proc->proc_node->node->node_name;

to

if(proc->proc_node){
    MPIR_proctable[i].host_name = proc->proc_node->node->node_name;
}


it works fine.
I am not sure why I am getting it as NULL. Any inputs will be appreciated.

Thanks a lot.
-Sushant

----------------------------------------------------------

Here is the gdb output for mpirun


(gdb) run --mca pls xcpu --hostfile /home/sushant/ompi/my-tests/hostfile -np 1 /home/sushant/ompi/my-tests/hello.o Starting program: /home/sushant/ompi/install/bin/mpirun --mca pls xcpu --hostfile /home/sushant/ompi/my-tests/hostfile -np 1 /home/sushant/ompi/my-tests/hello.o
[Thread debugging using libthread_db enabled]
[New Thread -1210691456 (LWP 8117)]
[New Thread -1211511888 (LWP 8120)]
[New Thread -1219900496 (LWP 8126)]
[New Thread -1228289104 (LWP 8127)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1210691456 (LWP 8117)]
0x0804d6b9 in orte_totalview_init_after_spawn (jobid=1) at ../../../../trunk/orte/tools/orterun/totalview.c:402 402 MPIR_proctable[i].host_name = proc->proc_node->node->node_name;
(gdb) where
#0 0x0804d6b9 in orte_totalview_init_after_spawn (jobid=1) at ../../../../trunk/orte/tools/orterun/totalview.c:402 #1 0x0804af92 in job_state_callback (jobid=1, state=4) at ../../../../trunk/orte/tools/orterun/orterun.c:638
#2  0xb7d5b8bc in orte_rmgr_urm_callback (data=0x80c7f00, cbdata=0x804aee8)
    at ../../../../../trunk/orte/mca/rmgr/urm/rmgr_urm.c:282
#3  0xb7ced98d in orte_gpr_replica_deliver_notify_msg (msg=0x80c7ed0)
at ../../../../../../trunk/orte/mca/gpr/replica/api_layer/gpr_replica_deliver_notify_msg_api.c:134
#4  0xb7cf68b9 in orte_gpr_replica_process_callbacks ()
at ../../../../../../trunk/orte/mca/gpr/replica/functional_layer/gpr_replica_messaging_fn.c:80 #5 0xb7d0221f in orte_gpr_replica_recv (status=1564, sender=0x80670a0, buffer=0xbfbc2820, tag=2, cbdata=0x0) at ../../../../../../trunk/orte/mca/gpr/replica/communications/gpr_replica_recv_proxy_msgs.c:85 #6 0xb7f74b4a in mca_oob_recv_callback (status=1564, peer=0x80670a0, msg=0x8083ec0, count=1, tag=2, cbdata=0x8083ec0)
    at ../../../../trunk/orte/mca/oob/base/oob_base_recv_nb.c:159
#7  0xb7d2e8ec in mca_oob_tcp_msg_data (msg=0x8068460, peer=0x8067080)
    at ../../../../../trunk/orte/mca/oob/tcp/oob_tcp_msg.c:487
#8 0xb7d2e506 in mca_oob_tcp_msg_recv_complete (msg=0x8068460, peer=0x8067080)
    at ../../../../../trunk/orte/mca/oob/tcp/oob_tcp_msg.c:396
#9 0xb7d31cf2 in mca_oob_tcp_peer_recv_handler (sd=10, flags=2, user=0x8067080)
    at ../../../../../trunk/orte/mca/oob/tcp/oob_tcp_peer.c:715
#10 0xb7f0990a in opal_event_process_active () at ../../../trunk/opal/event/event.c:428 #11 0xb7f09bc1 in opal_event_loop (flags=1) at ../../../trunk/opal/event/event.c:513 #12 0xb7f02d81 in opal_progress () at ../../trunk/opal/runtime/opal_progress.c:259 #13 0x0804c976 in opal_condition_wait (c=0x804fa90, m=0x804fa64) at condition.h:81 #14 0x0804a660 in orterun (argc=9, argv=0xbfbc2b24) at ../../../../trunk/orte/tools/orterun/orterun.c:415 #15 0x08049e76 in main (argc=9, argv=0xbfbc2b24) at ../../../../trunk/orte/tools/orterun/main.c:13
(gdb)

Reply via email to