Your suggestion worked Ralph. I only add :
OBJ_RELEASE(buffer); buffer = OBJ_NEW(opal_buffer_t); Thank you both for your help. Hugo 2011/3/8 George Bosilca <bosi...@eecs.utk.edu> > The stack trace indicate that your orted segfaulted in the > orte_odls_base_notify_iof_complete which means it received a message that > was interpreted as a ORTE_DAEMON_IOF_COMPLETE (21). Nothing more to get out > from your output unfortunately. > > george. > > On Mar 8, 2011, at 08:15 , Hugo Meyer wrote: > > > Hello @ll. > > > > I've got a problem in a communication between the > v_protocol_receiver_component.c and the orted_comm.c. > > > > In the mca_vprotocol_receiver_component_init i've added a request that > is received correctly by the orte_daemon_process_commands but when i try to > reply to the sender i get the next error: > > > > [clus1:15593] [ 0] /lib64/libpthread.so.0 [0x2aaaabb03d40] > > [clus1:15593] [ 1] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 > [0x2aaaaad760db] > > [clus1:15593] [ 2] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 > [0x2aaaaad75aa4] > > [clus1:15593] [ 3] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/openmpi/mca_errmgr_orted.so > [0x2aaaae2d2fdd] > > [clus1:15593] [ 4] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_odls_base_notify_iof_complete+0x1da) > [0x2aaaaad42cb0] > > [clus1:15593] [ 5] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_process_commands+0x1068) > [0x2aaaaad19ca6] > > [clus1:15593] [ 6] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x81b) > [0x2aaaaad18a55] > > [clus1:15593] [ 7] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 > [0x2aaaaad9710e] > > [clus1:15593] [ 8] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 > [0x2aaaaad974bb] > > [clus1:15593] [ 9] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_loop+0x1a) > [0x2aaaaad972ad] > > [clus1:15593] [10] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_dispatch+0xe) > [0x2aaaaad97166] > > [clus1:15593] [11] > /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon+0x2322) > [0x2aaaaad17556] > > [clus1:15593] [12] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted > [0x4008a3] > > [clus1:15593] [13] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x2aaaabd2d8a4] > > [clus1:15593] [14] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted > [0x400799] > > [clus1:15593] *** End of error message *** > > > > The code that i've added at the v_protocol_receiver_component.c is (in > bold the recv command that fails): > > > > int mca_vprotocol_receiver_request_protector(void) { > > orte_daemon_cmd_flag_t command; > > opal_buffer_t *buffer = NULL; > > int n = 1; > > > > command = ORTE_DAEMON_REQUEST_PROTECTOR_CMD; > > > > buffer = OBJ_NEW(opal_buffer_t); > > opal_dss.pack(buffer, &command, 1, ORTE_DAEMON_CMD); > > > > orte_rml.send_buffer(ORTE_PROC_MY_DAEMON, buffer, > ORTE_RML_TAG_DAEMON, 0); > > > > orte_rml.recv_buffer(ORTE_PROC_MY_DAEMON, buffer, > ORTE_DAEMON_REQUEST_PROTECTOR_CMD, 0); > > opal_dss.unpack(buffer, &mca_vprotocol_receiver.protector.jobid, &n, > OPAL_UINT32); > > opal_dss.unpack(buffer, &mca_vprotocol_receiver.protector.vpid, &n, > OPAL_UINT32); > > > > orte_process_info.protector.jobid = > mca_vprotocol_receiver.protector.jobid; > > orte_process_info.protector.vpid = > mca_vprotocol_receiver.protector.vpid; > > > > OBJ_RELEASE(buffer); > > > > return OMPI_SUCCESS; > > > > The code that i've added at the orted_comm.c is (in bold the send command > that fails): > > > > case ORTE_DAEMON_REQUEST_PROTECTOR_CMD: > > if (orte_debug_daemons_flag) { > > opal_output(0, "%s orted_recv: received request protector > from local proc %s", > > ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), > ORTE_NAME_PRINT(sender)); > > } > > /* Define the protector */ > > protector = (uint32_t)ORTE_PROC_MY_NAME->vpid + 1; > > if (protector >= (uint32_t)orte_process_info.num_procs) { > > protector = 0; > > } > > > > /* Pack the protector data */ > > answer = OBJ_NEW(opal_buffer_t); > > > > if (ORTE_SUCCESS != (ret = opal_dss.pack(answer, > &ORTE_PROC_MY_NAME->jobid, 1, OPAL_UINT32))) { > > ORTE_ERROR_LOG(ret); > > OBJ_RELEASE(answer); > > goto CLEANUP; > > } > > if (ORTE_SUCCESS != (ret = opal_dss.pack(answer, &protector, 1, > OPAL_UINT32))) { > > ORTE_ERROR_LOG(ret); > > OBJ_RELEASE(answer); > > goto CLEANUP; > > } > > if (orte_debug_daemons_flag) { > > opal_output(0, "EL PROTECTOR ASIGNADO para %s ES: %d\n", > > ORTE_NAME_PRINT(sender), protector); > > } > > > > /* Send the protector data */ > > if (0 > orte_rml.send_buffer(sender, answer, > ORTE_DAEMON_REQUEST_PROTECTOR_CMD, 0)) { > > ORTE_ERROR_LOG(ORTE_ERR_COMM_FAILURE); > > ret = ORTE_ERR_COMM_FAILURE; > > OBJ_RELEASE(answer); > > goto CLEANUP; > > } > > OBJ_RELEASE(answer); > > > > I assume by testing that the error is in the bolded section, maybe > because i'am missing some sentence when i try to communicate, or maybe this > communication cannot be done. Any help will be appreciated. > > > > Thanks a lot. > > > > Hugo Meyer > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > "I disapprove of what you say, but I will defend to the death your right to > say it" > -- Evelyn Beatrice Hall > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >