Hello @ll. I've got a problem in a communication between the*v_protocol_receiver_component.c * and the *orted_comm.c. *
In the *mca_vprotocol_receiver_component_init* i've added a request that is received correctly by the *orte_daemon_process_commands *but when i try to reply to the sender i get the next error: [clus1:15593] [ 0] /lib64/libpthread.so.0 [0x2aaaabb03d40] [clus1:15593] [ 1] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad760db] [clus1:15593] [ 2] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad75aa4] [clus1:15593] [ 3] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/openmpi/mca_errmgr_orted.so [0x2aaaae2d2fdd] [clus1:15593] [ 4] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_odls_base_notify_iof_complete+0x1da) [0x2aaaaad42cb0] [clus1:15593] [ 5] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_process_commands+0x1068) [0x2aaaaad19ca6] [clus1:15593] [ 6] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon_cmd_processor+0x81b) [0x2aaaaad18a55] [clus1:15593] [ 7] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad9710e] [clus1:15593] [ 8] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0 [0x2aaaaad974bb] [clus1:15593] [ 9] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_loop+0x1a) [0x2aaaaad972ad] [clus1:15593] [10] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(opal_event_dispatch+0xe) [0x2aaaaad97166] [clus1:15593] [11] /home/hmeyer/desarrollo/radic-ompi/binarios/lib/libopen-rte.so.0(orte_daemon+0x2322) [0x2aaaaad17556] [clus1:15593] [12] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted [0x4008a3] [clus1:15593] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2aaaabd2d8a4] [clus1:15593] [14] /home/hmeyer/desarrollo/radic-ompi/binarios/bin/orted [0x400799] [clus1:15593] *** End of error message *** The code that i've added at the *v_protocol_receiver_component.c *is (in bold the recv command that fails): int mca_vprotocol_receiver_request_protector(void) { orte_daemon_cmd_flag_t command; opal_buffer_t *buffer = NULL; int n = 1; command = ORTE_DAEMON_REQUEST_PROTECTOR_CMD; buffer = OBJ_NEW(opal_buffer_t); opal_dss.pack(buffer, &command, 1, ORTE_DAEMON_CMD); orte_rml.send_buffer(ORTE_PROC_MY_DAEMON, buffer, ORTE_RML_TAG_DAEMON, 0); *orte_rml.recv_buffer(ORTE_PROC_MY_DAEMON, buffer, ORTE_DAEMON_REQUEST_PROTECTOR_CMD, 0);* opal_dss.unpack(buffer, &mca_vprotocol_receiver.protector.jobid, &n, OPAL_UINT32); opal_dss.unpack(buffer, &mca_vprotocol_receiver.protector.vpid, &n, OPAL_UINT32); orte_process_info.protector.jobid = mca_vprotocol_receiver.protector.jobid; orte_process_info.protector.vpid = mca_vprotocol_receiver.protector.vpid; OBJ_RELEASE(buffer); return OMPI_SUCCESS; The code that i've added at the *orted_comm.c *is (in bold the send command that fails): case ORTE_DAEMON_REQUEST_PROTECTOR_CMD: if (orte_debug_daemons_flag) { opal_output(0, "%s orted_recv: received request protector from local proc %s", ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), ORTE_NAME_PRINT(sender)); } /* Define the protector */ protector = (uint32_t)ORTE_PROC_MY_NAME->vpid + 1; if (protector >= (uint32_t)orte_process_info.num_procs) { protector = 0; } /* Pack the protector data */ answer = OBJ_NEW(opal_buffer_t); if (ORTE_SUCCESS != (ret = opal_dss.pack(answer, &ORTE_PROC_MY_NAME->jobid, 1, OPAL_UINT32))) { ORTE_ERROR_LOG(ret); OBJ_RELEASE(answer); goto CLEANUP; } if (ORTE_SUCCESS != (ret = opal_dss.pack(answer, &protector, 1, OPAL_UINT32))) { ORTE_ERROR_LOG(ret); OBJ_RELEASE(answer); goto CLEANUP; } if (orte_debug_daemons_flag) { opal_output(0, "EL PROTECTOR ASIGNADO para %s ES: %d\n", ORTE_NAME_PRINT(sender), protector); } /* Send the protector data */ *if (0 > orte_rml.send_buffer(sender, answer, ORTE_DAEMON_REQUEST_PROTECTOR_CMD, 0)) {* * ORTE_ERROR_LOG(ORTE_ERR_COMM_FAILURE);* * ret = ORTE_ERR_COMM_FAILURE;* * OBJ_RELEASE(answer);* * goto CLEANUP;* } OBJ_RELEASE(answer); I assume by testing that the error is in the bolded section, maybe because i'am missing some sentence when i try to communicate, or maybe this communication cannot be done. Any help will be appreciated. Thanks a lot. Hugo Meyer