Hi Brian,

I tested this rc using both srun native launch and mpirun on the following
systems:
- LANL CTS-1 systems (haswell + Intel OPA/PSM2)
- LANL network testbed system (haswell  + connectX5/UCX and OB1)
- LANL Cray XC

I am finding some problems with mpirun on the network testbed system.

For example, for spawn_with_env_vars from IBM tests:

*** Error in `mpirun': corrupted double-linked list: 0x00000000006e75b0 ***

======= Backtrace: =========

/usr/lib64/libc.so.6(+0x7bea2)[0x7ffff6597ea2]

/usr/lib64/libc.so.6(+0x7cec6)[0x7ffff6598ec6]

/home/hpp/openmpi_3.0.0rc1_install/lib/libopen-pal.so.40(opal_proc_table_remove_all+0x91)[0x7ffff7855851]

/home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_oob_ud.so(+0x5e09)[0x7ffff3cc0e09]

/home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_oob_ud.so(+0x5952)[0x7ffff3cc0952]

/home/hpp/openmpi_3.0.0rc1_install/lib/libopen-rte.so.40(+0x6b032)[0x7ffff7b94032]

/home/hpp/openmpi_3.0.0rc1_install/lib/libopen-pal.so.40(mca_base_framework_close+0x7d)[0x7ffff788592d]

/home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_ess_hnp.so(+0x3e4d)[0x7ffff5b04e4d]

/home/hpp/openmpi_3.0.0rc1_install/lib/libopen-rte.so.40(orte_finalize+0x79)[0x7ffff7b43bf9]

mpirun[0x4014f1]

mpirun[0x401018]

/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff653db15]

mpirun[0x400f29]

and another like

[hpp@hi-master dynamic (master *)]$mpirun -np 1 ./spawn_with_env_vars

Spawning...

Spawned

Child got foo and baz env variables -- yay!

*** Error in `mpirun': corrupted double-linked list: 0x00000000006eb350 ***

======= Backtrace: =========

/usr/lib64/libc.so.6(+0x7b184)[0x7ffff6597184]

/usr/lib64/libc.so.6(+0x7d1ec)[0x7ffff65991ec]

/home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_oob_tcp.so(+0x57a2)[0x7ffff32297a2]

/home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_oob_tcp.so(+0x5a87)[0x7ffff3229a87]

/home/hpp/openmpi_3.0.0rc1_install/lib/libopen-rte.so.40(+0x6b032)[0x7ffff7b94032]

/home/hpp/openmpi_3.0.0rc1_install/lib/libopen-pal.so.40(mca_base_framework_close+0x7d)[0x7ffff788592d]

/home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_ess_hnp.so(+0x3e4d)[0x7ffff5b04e4d]

/home/hpp/openmpi_3.0.0rc1_install/lib/libopen-rte.so.40(orte_finalize+0x79)[0x7ffff7b43bf9]

mpirun[0x4014f1]

mpirun[0x401018]

/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff653db15]

mpirun[0x400f29]
It doesn't happen on every run though.

I'll do some more investigating, but probably not till next week.

Howard


2017-06-28 11:50 GMT-06:00 Barrett, Brian via devel <
devel@lists.open-mpi.org>:

> The first release candidate of Open MPI 3.0.0 is now available (
> https://www.open-mpi.org/software/ompi/v3.0/).  We expect to have at
> least one more release candidate, as there are still outstanding MPI-layer
> issues to be resolved (particularly around one-sided).  We are posting
> 3.0.0rc1 to get feedback on run-time stability, as one of the big features
> of Open MPI 3.0 is the update to the PMIx 2 runtime environment.  We would
> appreciate any and all testing you can do,  around run-time behaviors.
>
> Thank you,
>
> Brian & Howard
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to