On Jul 12, 2012, at 12:04 PM, Paul Kapinos wrote: > a long time ago, I reported about an error in Open MPI: > http://www.open-mpi.org/community/lists/users/2012/02/18565.php > > Well, in the 1.6 the behaviour has changed: the test case don't hang forever > and block an InfiniBand interface, but seem to run through, and now this > error message is printed: > -------------------------------------------------------------------------- > The OpenFabrics (openib) BTL failed to register memory in the driver. > Please check /var/log/messages or dmesg for driver specific failure > reason.
We updated our mechanism, but accidentally left this warning message in (it has since been removed). Here's what's happening: Mellanox changed the default amount of registered memory that is available -- they dramatically reduced it. We haven't gotten a good answer yet as to *why* this change was made. You can change some kernel-level parameters to increase it again, and then OMPI should work fine. Here's an IBM article about it: http://www.ibm.com/developerworks/wikis/display/hpccentral/Using+RDMA+with+pagepool+larger+than+8GB And here's some comments that Mellanox made on a ticket about this issue (including some corrections/clarifications to that IBM article): https://svn.open-mpi.org/trac/ompi/ticket/3134#comment:12 ----- Basically, what's happening is that OMPI is behaving badly when it runs out of registered memory. We have tried two things to make this better (i.e., still perform *correctly*, albeit at a lower performance level), and we're not sure yet whether they work properly. 1. When OMPI tries to register more memory for an RDMA message transaction and fails, it falls back to send-receive (where we already have pre-registered memory available to use). However, this can still end up hanging because of OMPI's "lazy connection" scheme -- where OMPI doesn't open IB connections between MPI processes until the first time each pair of processes communicate. So if OMPI runs out of registered memory and then tries to open a new IB connection to a new peer -- kaboom. 2. When OMPI starts it, it guesstimates how much memory can be registered and equally divides it between all the OMPI processes *in that job* on the same node. We had mixed reports of this working or not. I made a 1.6.x tarball with this fix in it, if you could give it a whirl (with the default low registered memory kernel parameters, to ensure that you can invoke the "out of registered memory" issue): http://www.open-mpi.org/~jsquyres/unofficial/ Use the openmpi-1.6.1ticket3131r26612M.tar.bz2 tarball #2 is the latest attempt to fix it, but we haven't had good testing of it. Could you give it a whirl and let us know what happens? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/