Hi,

When forwarding stdin to all ranks in the job (mpirun --stdin all), the
following error message is output:

------------------
[berlin73:02223] [[56600,0],0] ORTE_ERROR_LOG: A message is attempting
to be sent to a process whose contact information is unknown in
file ../../../../../orte/mca/rml/oob/rml_oob_send.c at line 316
[berlin73:02223] [[56600,0],0] unable to find address for
[[INVALID],INVALID]
[berlin73:02223] [[56600,0],0] ORTE_ERROR_LOG: A message is attempting
to be sent to a process whose contact information is unknown in
file ../../../../../orte/mca/iof/hnp/iof_hnp_send.c at line 116
------------------

This is due to the daemon part of the sink structure not beeing
initialized in hnp_push() when the destination vpid is
ORTE_VPID_WILDCARD.
And then, when orte_iof_hnp_read_local_handler() is called, it calls
orte_iof_hnp_send_data_to_endpoint() with a sink->daemon that is not
set.
orte_iof_hnp_send_data_to_endpoint() in turn doesn't call
orte_grpcomm.xcast() but orte_rml.send_buffer_nb() with an invalid host.

The attached patch applied on the trunk solves the issue. This patch is
trivial, but since it's the first time I have to look at iof code, I'm
not sure of all its impacts...

Regards,
Nadia
daemon part of the sink structure is not initialzaed when forwarding stdin to all ranks

diff -r 490e6afa37fe orte/mca/iof/hnp/iof_hnp.c
--- a/orte/mca/iof/hnp/iof_hnp.c	Tue Mar 06 11:56:15 2012 +0100
+++ b/orte/mca/iof/hnp/iof_hnp.c	Tue Mar 06 12:43:44 2012 +0100
@@ -263,6 +263,8 @@ static int hnp_push(const orte_process_n
         ORTE_IOF_SINK_DEFINE(&sink, dst_name, -1, ORTE_IOF_STDIN,
                              stdin_write_handler,
                              &mca_iof_hnp_component.sinks);
+        sink->daemon.jobid = ORTE_PROC_MY_NAME->jobid;
+        sink->daemon.vpid = ORTE_VPID_WILDCARD;
     } else {
         /* no - lookup the proc's daemon and set that into sink */
         if (NULL == (jdata = orte_get_job_data_object(dst_name->jobid))) {

Reply via email to