Sorry for the delay in replying -- I was in Europe for the past two weeks; 
travel always makes me waaaay behind on my INBOX...


On Sep 14, 2010, at 9:56 PM, 张晶 wrote:

> I tried to add a schedule algorithm to the pml component ,ob1 etc. Poorly I 
> can only find a  paper named  "Open MPI: A Flexible High Performance MPI" and 
> some annotation in the source file.  From them , I know ob1 has implemented   
> round-robin& weighted distribution algorithm. But after tracking the 
> MPI_Send(),I cann't figure out 
> the location of these implement ,let alone to add a new schedule algorithm. 
> I have two questions :
> 1.The location of the schedule algorithm ?

It's complicated -- I'd say that the PML is probably among the most complicated 
sections of Open MPI because it is the main "engine" that enforces the MPI 
point-to-point semantics.  The algorithm is fairly well distribute throughout 
the PML source code.  :-\

> 2.There are five components :cm,crcpw ,csum ,ob1,V in the pml framework . The 
> function of these components?

cm: this component drives the MTL point-to-point components.  It is mainly a 
thin wrapper for network transports that provide their own MPI-like matching 
semantics.  Hence, most of the MPI semantics are effectively done in the lower 
layer (i.e., in the MTL components and their dependent libraries).  You 
probably won't be able to do much here, because such transports (MX, Portals, 
etc.) do most of their semantics in the network layer -- not in Open MPI.  If 
you have a matching network layer, this is the PML that you probably use (MX, 
Portals, PSM).

crcpw: this is a fork of the ob1 PML; it add some failover semantics.

csum: this is also a fork of the ob1 PML; it adds checksumming semantics (so 
you can tell if the underlying transport had an error).

v: this PML uses logging and replay to effect some level of fault tolerance.  
It's a distant fork of the ob1 PML, but has quite a few significant differences.

ob1: this is the "main" PML that most users use (TCP, shared memory, 
OpenFabrics, etc.).  It gangs together one or more BTLs to send/receive 
messages across individual network transports.  Hence, it supports true 
multi-device/multi-rail algorithms.  The BML (BTL multiplexing layer) is a thin 
management later that marshals all the BTLs in the process together -- it's 
mainly array handling, etc.  The ob1 PML is the one that decides 
multi-rail/device splitting, etc.  The INRIA folks just published a paper last 
week at Euro MPI about adjusting the ob1 scheduling algorithm to also take 
NUMA/NUNA/NUIOA effects into account, not just raw bandwidth calculations.

Hope this helps!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to