Hi Brian, I tested this rc using both srun native launch and mpirun on the following systems: - LANL CTS-1 systems (haswell + Intel OPA/PSM2) - LANL network testbed system (haswell + connectX5/UCX and OB1) - LANL Cray XC
I am finding some problems with mpirun on the network testbed system. For example, for spawn_with_env_vars from IBM tests: *** Error in `mpirun': corrupted double-linked list: 0x00000000006e75b0 *** ======= Backtrace: ========= /usr/lib64/libc.so.6(+0x7bea2)[0x7ffff6597ea2] /usr/lib64/libc.so.6(+0x7cec6)[0x7ffff6598ec6] /home/hpp/openmpi_3.0.0rc1_install/lib/libopen-pal.so.40(opal_proc_table_remove_all+0x91)[0x7ffff7855851] /home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_oob_ud.so(+0x5e09)[0x7ffff3cc0e09] /home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_oob_ud.so(+0x5952)[0x7ffff3cc0952] /home/hpp/openmpi_3.0.0rc1_install/lib/libopen-rte.so.40(+0x6b032)[0x7ffff7b94032] /home/hpp/openmpi_3.0.0rc1_install/lib/libopen-pal.so.40(mca_base_framework_close+0x7d)[0x7ffff788592d] /home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_ess_hnp.so(+0x3e4d)[0x7ffff5b04e4d] /home/hpp/openmpi_3.0.0rc1_install/lib/libopen-rte.so.40(orte_finalize+0x79)[0x7ffff7b43bf9] mpirun[0x4014f1] mpirun[0x401018] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff653db15] mpirun[0x400f29] and another like [hpp@hi-master dynamic (master *)]$mpirun -np 1 ./spawn_with_env_vars Spawning... Spawned Child got foo and baz env variables -- yay! *** Error in `mpirun': corrupted double-linked list: 0x00000000006eb350 *** ======= Backtrace: ========= /usr/lib64/libc.so.6(+0x7b184)[0x7ffff6597184] /usr/lib64/libc.so.6(+0x7d1ec)[0x7ffff65991ec] /home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_oob_tcp.so(+0x57a2)[0x7ffff32297a2] /home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_oob_tcp.so(+0x5a87)[0x7ffff3229a87] /home/hpp/openmpi_3.0.0rc1_install/lib/libopen-rte.so.40(+0x6b032)[0x7ffff7b94032] /home/hpp/openmpi_3.0.0rc1_install/lib/libopen-pal.so.40(mca_base_framework_close+0x7d)[0x7ffff788592d] /home/hpp/openmpi_3.0.0rc1_install/lib/openmpi/mca_ess_hnp.so(+0x3e4d)[0x7ffff5b04e4d] /home/hpp/openmpi_3.0.0rc1_install/lib/libopen-rte.so.40(orte_finalize+0x79)[0x7ffff7b43bf9] mpirun[0x4014f1] mpirun[0x401018] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff653db15] mpirun[0x400f29] It doesn't happen on every run though. I'll do some more investigating, but probably not till next week. Howard 2017-06-28 11:50 GMT-06:00 Barrett, Brian via devel < devel@lists.open-mpi.org>: > The first release candidate of Open MPI 3.0.0 is now available ( > https://www.open-mpi.org/software/ompi/v3.0/). We expect to have at > least one more release candidate, as there are still outstanding MPI-layer > issues to be resolved (particularly around one-sided). We are posting > 3.0.0rc1 to get feedback on run-time stability, as one of the big features > of Open MPI 3.0 is the update to the PMIx 2 runtime environment. We would > appreciate any and all testing you can do, around run-time behaviors. > > Thank you, > > Brian & Howard > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel