We have had a FAQ on this for a long time...problem is, nobody reads it :-/
Glad you found the problem! On May 14, 2010, at 3:15 PM, Paul H. Hargrove wrote: > Oskar Enoksson wrote: >> Christopher Samuel wrote: >> >>> Subject: Re: [OMPI devel] Very poor performance with btl sm on twin >>> nehalem servers with Mellanox ConnectX installed >>> To: de...@open-mpi.org >>> Message-ID: >>> <d45958078cd65c429557b4c5f492b6a60770e...@is-ex-bev3.unimelb.edu.au> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> On 13/05/10 20:56, Oskar Enoksson wrote: >>> >>> >>>> The problem is that I get very bad performance unless I >>>> explicitly exclude the "sm" btl and I can't figure out why. >>>> >>> Recently someone reported issues which were traced back to >>> the fact that the files that sm uses for mmap() were in a >>> /tmp which was NFS mounted; changing the location where their >>> files were kept to another directory with the orte_tmpdir_base >>> MCA parameter fixed that issue for them. >>> >>> Could it be similar for yourself ? >>> >>> cheers, >>> Chris >>> >> That was exactly right, as you guessed these are diskless nodes that >> mounts the root filesystem over NFS. >> >> Setting orte_tmpdir_base to /dev/shm and btl_sm_num_fifos=9 and then >> running mpi_stress on eight cores measures speeds of 1650MB/s for both >> 1MB messages and 1600MB/s for 10kB messages. >> >> Thanks! >> /Oskar >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > Sounds like a new FAQ entry is warranted. > > -Paul > > -- > Paul H. Hargrove phhargr...@lbl.gov > Future Technologies Group > HPC Research Department Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel