Hi Howard,

Couple more questions to understand a little better the context:

-          What type of job running?

-          Is this also under srun?

For PSM2 you may find more details in the programmer’s guide:
http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_v1_0.pdf

To disable shared memory:
Section 2.7.1:
PSM2_DEVICES="self,fi"

Thanks,
_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Tuesday, April 19, 2016 11:04 AM
To: Open MPI Developers List <de...@open-mpi.org>
Subject: [OMPI devel] PSM2 Intel folks question

Hi Folks,

I'm making progress with issue #1559 (patches on the mail list didn't help),
and I'll open a PR to help the PSM2 MTL work on a single node, but I'm
noticing something more troublesome.

If I run on just one node, and I use more than one process, process zero
consistently hangs in psm2_ep_connect.

I've tried using the psm2 code on github - at sha e951cf31, but I still see
the same behavior.

The PSM2 related rpms installed on our system are:

infinipath-psm-devel-3.3-0.g6f42cdb1bb8.2.el7.x86_64
hfi1-psm-0.7-221.ch6.x86_64
hfi1-psm-devel-0.7-221.ch6.x86_64
infinipath-psm-3.3-0.g6f42cdb1bb8.2.el7.x86_64
should we get newer rpms installed?

Is there a way to disable the AMSHM path?  I'm wondering if that
would help since multi-node jobs seems to run fine.

Thanks for any help,

Howard

Reply via email to