Re: [OMPI devel] Changing BTLs at runtime

Josh Hursey Mon, 29 Mar 2010 12:24:34 -0400

This line of work sounds interesting. Just wanted to add my 2 cents onone point below.


On Mar 26, 2010, at 9:46 AM, Christoph Konersmann wrote:

The Background:
I should give some background, why I'm implementing this. Changingthe
MPI communication from a high speed network to a network with
flowcontrol (openib->tcp) is necessary for checkpointing distributed
applications in virtual machines. Ok, you are able to checkpointthrough
the FT-Framework and BLCR in Open MPI, but virtual machines already
provide trivial functions for checkpointing. As you are not able to
checkpoint the hardware information of e.g. openib you have to getridof it in case of a checkpoint, and change back again on resume/continue.
I'm not quite sure I understand. I can see how the original modelof CRS and SNAPC don't quite fit that of VM's, but I don't quiteunderstand what switching openib -> tcp and then later tcp ->openib gives you...?
Can't you just quiesce the openib BTL, let the VM checkpoint, andthen resume with openib? (or whatever other non TCP/sm BTL you want)
I worked under the assumption that a virtualization might supportInfiniBand through SR-IOV. So every virtual machine has thepossibility to use it at full speed. Just starving out thecommunication between InfiniBand devices would not help in case ofmigration when the underlying hardware and its configuration wouldchange. Therefore I have to unload the desired BTL module. To makesure that absolutely no bml uses infiniband for transfer anymore, Ichange the communication to another device whose protocol is knownto work with migrating virtual machines, like tcp.

A few papers have pointed out the difficulties of support InfiniBandin a virtualization environment where migration is a wanted feature.Most solutions involve shutting down the InfiniBand network, movingthe process, then restarting the communication. It's a neat idea toshift the network load to the TCP network to allow the application tocontinue communication (though at diminished performance) during themigration to work around the InfiniBand issue.

Checkpointing would work with just quiesce the communication if theinfiniband hardware will not changed.

Just wanted to mention that in Open MPI we have the ability to choosea new set of BTLs on restart in our current C/R infrastructure. So wecan checkpoint process A which was communicating with process B over'openib', and then restart them on the same machine and have themtransparently switch to 'sm'. Then we can move them apart and havethem pick another set of BTLs for communication (either 'tcp' or backto 'openib' or something else entirely like 'mx').


-- Josh


Kind regards,
Christoph Konersmann
--
Paderborn Center for Parallel Computing - PC2
University of Paderborn - Germany
http://www.pc2.de

Christoph Konersmann <[email protected]>
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Changing BTLs at runtime

Reply via email to