I just pushed a solution to this problem in 8d0baf140f. If we are unable to
extract the expected information from the RTE, we simply build a
non-reordered communicator and gracefully return.

That being said, not being able to correctly retrieve OPAL_PMIX_NODEID has
the potential to drastically decrease the performance as no specialized
hierarchies can be built without the RTE information.

  George.


On Wed, Aug 10, 2016 at 3:57 AM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Ralph,
>
>
> i noticed dist-graph/distgraph_test_4 from the ibm test suite fails when
> using a hostfile and running no task on the host running mpirun.
>
> n0$ mpirun --host n1:1,n2:1 -np 2 ./dist-graph/distgraph_test_4
>
>
> the root cause is OPAL_PMIX_NODEID is correctly set ( 0, 1, 2) by mpirun,
> but for some reasons, orted sets it to -1 everywhere.
>
> an indirect consequence is a crash of the test (it believes tasks run on
> zero distinct nodes instead of 2)
>
>
> this occurs only master, and v2.x is fine.
>
>
> Could you please have a look ?
>
>
> Cheers,
>
>
> Gilles
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to