Hi, When forwarding stdin to all ranks in the job (mpirun --stdin all), the following error message is output:
------------------ [berlin73:02223] [[56600,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../orte/mca/rml/oob/rml_oob_send.c at line 316 [berlin73:02223] [[56600,0],0] unable to find address for [[INVALID],INVALID] [berlin73:02223] [[56600,0],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../orte/mca/iof/hnp/iof_hnp_send.c at line 116 ------------------ This is due to the daemon part of the sink structure not beeing initialized in hnp_push() when the destination vpid is ORTE_VPID_WILDCARD. And then, when orte_iof_hnp_read_local_handler() is called, it calls orte_iof_hnp_send_data_to_endpoint() with a sink->daemon that is not set. orte_iof_hnp_send_data_to_endpoint() in turn doesn't call orte_grpcomm.xcast() but orte_rml.send_buffer_nb() with an invalid host. The attached patch applied on the trunk solves the issue. This patch is trivial, but since it's the first time I have to look at iof code, I'm not sure of all its impacts... Regards, Nadia
daemon part of the sink structure is not initialzaed when forwarding stdin to all ranks diff -r 490e6afa37fe orte/mca/iof/hnp/iof_hnp.c --- a/orte/mca/iof/hnp/iof_hnp.c Tue Mar 06 11:56:15 2012 +0100 +++ b/orte/mca/iof/hnp/iof_hnp.c Tue Mar 06 12:43:44 2012 +0100 @@ -263,6 +263,8 @@ static int hnp_push(const orte_process_n ORTE_IOF_SINK_DEFINE(&sink, dst_name, -1, ORTE_IOF_STDIN, stdin_write_handler, &mca_iof_hnp_component.sinks); + sink->daemon.jobid = ORTE_PROC_MY_NAME->jobid; + sink->daemon.vpid = ORTE_VPID_WILDCARD; } else { /* no - lookup the proc's daemon and set that into sink */ if (NULL == (jdata = orte_get_job_data_object(dst_name->jobid))) {