Hi Mohammed, On 22 Sep 2013, at 12:57, Mohammed Gaafar wrote:
> Dear EasyBuilders, > I am facing a problem with the OpenMPI/1.4.5-GCC-4.6.3-no-OFED module. It > doesn't work with some software packages (NWChem for example) and gives and > error message similar to this one. > > [comp023.local][[32496,1],72][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] > connect() to 192.168.30.24 failed: Connection refused (111) > > I have installed the same OpenMPI version manually and it worked fine. Also, > I have installed another version of OpenMPI using EasyBuild and it worked > fine. The importance of this module comes from that this is the one included > in the goalf-1.1 module which became very popular on our system at BA and we > always use it to build our software. > > What I understand from this error is that some network interfaces are note > reachable by the MPI. On the other hand, those interfaces are working fine > with the other versions of MPI and doesn't give any error. I don't know if > this is relevant or not but it is reporting this error on the InfiniBand > network (192.168.30.0 subnet). > > Any ideas regarding troubleshooting or solutions to this problem. This is not surprising: the goalf toolchain and the OpenMPI build it uses has the -no-OFED version suffix, indicating that OpenMPI was built without Infiniband support (i.e., --without-openib). In the very early easyconfig files we shipped, the --without-openib was not used explicitly. This was found to be a bug because that allowed OpenMPI to enable IB support by itself, which doesn't match the -no-OFED version suffix. Like Fotis already suggested, the goolf toolchain (which uses OpenBLAS instead of ATLAS) is likely to be a better choice if you need/want to stick with an open source toolchain. If you want to stick with goalf, you should compose a version that does have IB support (let us know if you need help there). I hope this helps, and see you in Cyprus in a couple of weeks. ;-) regards, Kenneth

