Hi Folks, I'm making progress with issue #1559 (patches on the mail list didn't help), and I'll open a PR to help the PSM2 MTL work on a single node, but I'm noticing something more troublesome.
If I run on just one node, and I use more than one process, process zero consistently hangs in psm2_ep_connect. I've tried using the psm2 code on github - at sha e951cf31, but I still see the same behavior. The PSM2 related rpms installed on our system are: infinipath-*psm*-devel-3.3-0.g6f42cdb1bb8.2.el7.x86_64 hfi1-*psm*-0.7-221.ch6.x86_64 hfi1-*psm*-devel-0.7-221.ch6.x86_64 infinipath-*psm*-3.3-0.g6f42cdb1bb8.2.el7.x86_64 should we get newer rpms installed? Is there a way to disable the AMSHM path? I'm wondering if that would help since multi-node jobs seems to run fine. Thanks for any help, Howard