We are using open-mpi on several 1000+ node clusters.  We received
several new clusters using the Infiniserve 3.X software stack recently
and are having several problems with the vapi btl (yes, I know, it is
very very old and shouldn't be used.  I couldn't agree with you more
but those are my marching orders).

I have a new application that is running into swap for an unknown
reason.  If I run and force it to use the tcp btl I don't seem to run
into swap (the job just takes a very very long time).  I have tried
restricting the size of the free lists, forcing to use send mode, and
use an open-mpi compiled w/ no memory manager but nothing seems to
help.  I've profiled with valgrind --tool=massif and the memtrace
capabilities of ptmalloc but I don't have any smoking guns yet.  It is
a fortran app an I don't know anything about debugging fortran memory
problems, can someone point me in the proper direction?

Thanks,
Josh

Reply via email to