On Jun 29, 2011, at 3:57 AM, Kawashima wrote:

> First, we created a new BTL component, 'tofu BTL'. It's not so special
> one but dedicated to our Tofu interconnect. But its latency was not
> enough for us.
> 
> So we created a new framework, 'LLP', and its component, 'tofu LLP'.
> It bypasses request object creation in PML and BML/BTL, and sends
> a message immediately if possible.

Gotcha.  Was the sendi pml call not sufficient?  (sendi = "send immediate")  
This call was designed to be part of a latency reduction mechanism.  I forget 
offhand what we don't do before calling sendi, but the rationale was that if 
the message was small enough, we could skip some steps in the sending process 
and "just send it."

Note, too, that the coll modules can be laid overtop of each other -- e.g., if 
you only implement barrier (and some others) in tofu coll, then you can supply 
NULL for the other function pointers and the coll base will resolve those 
functions to other coll modules automatically.

> Also, we modified tuned COLL to implement interconnect-and-topology-
> specific bcast/allgather/alltoall/allreduce algorithm. These algorithm
> implementations also bypass PML/BML/BTL to eliminate protocol and software
> overhead.

Good.  As Sylvain mentioned, that was the intent of the coll framework -- it 
certainly isn't *necessary* for coll's to always implement their underlying 
sends/receives with the BTL.  The sm coll does this, for example -- it uses its 
own shared memory block for talking to other the sm coll's in other processes 
on the same node, but it doesn't go through the sm BTL.

> To achieve above, we created 'tofu COMMON', like sm (ompi/mca/common/sm/).
> 
> Is there interesting one?
> 
> Though our BTL and COLL are quite interconnect-specific, LLP may be
> contributed in the future.

Yes, it may be interesting to see what you did there.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to