On Mar 23, 2010, at 4:02 AM, Christoph Konersmann wrote: > It was long ago where I've asked about hints to implement a dynamic BTL > control. I've currently managed to change the MPI communication path > from a BTL module (e.g. openib) to another BTL module (e.g. tcp) at > runtime of a distributed application. > > For this I've developed a so called BTL Control Client (orte-btlctl) to > send control messages to all processes through the ORTE RML.
Cool! FWIW, you might want to name it ompi-btlctl. ORTE is our run-time layer and has no knowledge of the BTL's. > These > messages are received and processed in the OMPI BML. In BML I've > implemented a function to stop the MPI communication and another for > changing the BTL exclusivity and recalculating the btl_{send,eager,rdma} > lists. All is done at runtime so a distributed application running with > Open MPI is not affected in its computation. > > I also managed to unload a module not used anymore, e.g. openib after > changing the MPI communication to tcp, through the already implemented > function mca_bml_r2_del_btl(mca_btl_base_module_t* btl). Sounds great! > The Question: > The function to (re)initialise a BTL module > "mca_bml_r2_add_btl(mca_btl_base_module_t* btl)" is currently not > implemented. Why is it not implemented? And what has to be done if I > want to implement it? I'm actually not sure -- this is not an area of the code where I am an expert... It looks like the r2 proc_add is calling the internal function add_btls (plural). I don't know where in the code base calls bml.add_btl...? (does anywhere call it?) It may have been planned but then never used...? > As far as I understood the internals of the OMPI Layer, for adding a BTL > module you have to implement the following steps: > 1. find the corresponding component in mca_btl_base_components_opened > 2. Do component->btl_init to get an array of BTL modules > 3. and add those to mca_btl_base_modules_initialized > 4. Iterate through mca_btl_base_modules_initialized and add BTL module > to mca_bml_r2.btl_modules in bml_r2 > 5. Add BTL module to btl_{send,eager,rdma} (if applicable) for all > reachable procs This *sounds* right, but again, I'm not the expert in this part of the code base. > The Background: > I should give some background, why I'm implementing this. Changing the > MPI communication from a high speed network to a network with > flowcontrol (openib->tcp) is necessary for checkpointing distributed > applications in virtual machines. Ok, you are able to checkpoint through > the FT-Framework and BLCR in Open MPI, but virtual machines already > provide trivial functions for checkpointing. As you are not able to > checkpoint the hardware information of e.g. openib you have to get rid > of it in case of a checkpoint, and change back again on resume/continue. I'm not quite sure I understand. I can see how the original model of CRS and SNAPC don't quite fit that of VM's, but I don't quite understand what switching openib -> tcp and then later tcp -> openib gives you...? Can't you just quiesce the openib BTL, let the VM checkpoint, and then resume with openib? (or whatever other non TCP/sm BTL you want) -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/