Errata: PSM2_DEVICES="self,hfi"
_MAC From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Cabral, Matias A Sent: Tuesday, April 19, 2016 11:25 AM To: Open MPI Developers <de...@open-mpi.org> Subject: Re: [OMPI devel] PSM2 Intel folks question Hi Howard, Couple more questions to understand a little better the context: - What type of job running? - Is this also under srun? For PSM2 you may find more details in the programmer’s guide: http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_v1_0.pdf To disable shared memory: Section 2.7.1: PSM2_DEVICES="self,fi" Thanks, _MAC From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard Sent: Tuesday, April 19, 2016 11:04 AM To: Open MPI Developers List <de...@open-mpi.org<mailto:de...@open-mpi.org>> Subject: [OMPI devel] PSM2 Intel folks question Hi Folks, I'm making progress with issue #1559 (patches on the mail list didn't help), and I'll open a PR to help the PSM2 MTL work on a single node, but I'm noticing something more troublesome. If I run on just one node, and I use more than one process, process zero consistently hangs in psm2_ep_connect. I've tried using the psm2 code on github - at sha e951cf31, but I still see the same behavior. The PSM2 related rpms installed on our system are: infinipath-psm-devel-3.3-0.g6f42cdb1bb8.2.el7.x86_64 hfi1-psm-0.7-221.ch6.x86_64 hfi1-psm-devel-0.7-221.ch6.x86_64 infinipath-psm-3.3-0.g6f42cdb1bb8.2.el7.x86_64 should we get newer rpms installed? Is there a way to disable the AMSHM path? I'm wondering if that would help since multi-node jobs seems to run fine. Thanks for any help, Howard