Fixed - r26406
On May 7, 2012, at 10:35 PM, Eugene Loh wrote: > Here is another trunk hang. I get it if I use at least three remote nodes. > E.g., with r26385: > > % mpirun -H remoteA,remoteB,remoteC -n 2 hostname > [remoteA:20508] [[54625,0],1] ORTE_ERROR_LOG: Not found in file > base/ess_base_fns.c at line 135 > [remoteA:20508] [[54625,0],1] unable to get hostname for daemon 3 > [remoteA:20508] [[54625,0],1] ORTE_ERROR_LOG: Not found in file > orted/orted_comm.c at line 345 > [hang] > > I think the problem first appeared with r26359. > > I guess if a remote orted has to spawn another orted, it gets here: > > opal_pointer_array_get_item(table = 0x7e410, element_index = 3), line 136 in > "opal_pointer_array.h" > find_proc(proc = 0xffbff264), line 51 in "ess_base_fns.c" > orte_ess_base_proc_get_hostname(proc = 0xffbff264), line 134 in > "ess_base_fns.c" > remote_spawn(launch = 0x85f30), line 812 in "plm_rsh_module.c" > orte_daemon_recv(status = 0, sender = 0x85f54, buffer = 0x85f30, tag = 1U, > cbdata = (nil)), line 344 in "orted_comm.c" > orte_rml_recv_msg_callback(status = 0, peer = 0x69014, iov = 0x7d7e0, count > = 2, tag = 1U, cbdata = 0x85ec0), line 68 in "rml_oob_recv.c" > mca_oob_tcp_msg_data(msg = 0x85310, peer = 0x69000), line 436 in > "oob_tcp_msg.c" > mca_oob_tcp_msg_recv_complete(msg = 0x85310, peer = 0x69000), line 322 in > "oob_tcp_msg.c" > mca_oob_tcp_peer_recv_handler(sd = 13, flags = 2, user = 0x69000), line 942 > in "oob_tcp_peer.c" > event_persist_closure(base = 0x3c600, ev = 0x647a8), line 1280 in "event.c" > event_process_active_single_queue(base = 0x3c600, activeq = 0x3c4f0), line > 1324 in "event.c" > event_process_active(base = 0x3c600), line 1396 in "event.c" > opal_libevent2013_event_base_loop(base = 0x3c600, flags = 1), line 1593 in > "event.c" > orte_daemon(argc = 19, argv = 0xffbff97c), line 729 in "orted_main.c" > main(argc = 19, argv = 0xffbff97c), line 62 in "orted.c" > > So, in my case, I'm trying to look up item 3 while only item 1 in the array > appears to be initialized. > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel