Hello Oliver, thanks for the update. Just my $0.02: the upcoming Open MPI v1.5 will warn users, if their session directory is on NFS (or Lustre).
Best regards, Rainer On Thursday 22 April 2010 11:37:48 am Oliver Geisler wrote: > To sum up and give an update: > > The extended communication times while using shared memory communication > of openmpi processes are caused by openmpi session directory laying on > the network via NFS. > > The problem is resolved by establishing on each diskless node a ramdisk > or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to > point to the according mountpoint shared memory communication and its > files are kept local, thus decreasing the communication times by > magnitudes. > > The relation of the problem to the kernel version is not really > resolved, but maybe not "the problem" in this respect. > My benchmark is now running fine on a single node with 4 CPU, kernel > 2.6.33.1 and openmpi 1.4.1. > Running on multiple nodes I experience still higher (TCP) communication > times than I would expect. But that requires me some more deep > researching the issue (e.g. collisions on the network) and should > probably posted to a new thread. > > Thank you guys for your help. > > oli > -- ------------------------------------------------------------------------ Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008 AIM/Skype: rusraink