On Jun 29, 2011, at 3:57 AM, Kawashima wrote: > First, we created a new BTL component, 'tofu BTL'. It's not so special > one but dedicated to our Tofu interconnect. But its latency was not > enough for us. > > So we created a new framework, 'LLP', and its component, 'tofu LLP'. > It bypasses request object creation in PML and BML/BTL, and sends > a message immediately if possible.
Gotcha. Was the sendi pml call not sufficient? (sendi = "send immediate") This call was designed to be part of a latency reduction mechanism. I forget offhand what we don't do before calling sendi, but the rationale was that if the message was small enough, we could skip some steps in the sending process and "just send it." Note, too, that the coll modules can be laid overtop of each other -- e.g., if you only implement barrier (and some others) in tofu coll, then you can supply NULL for the other function pointers and the coll base will resolve those functions to other coll modules automatically. > Also, we modified tuned COLL to implement interconnect-and-topology- > specific bcast/allgather/alltoall/allreduce algorithm. These algorithm > implementations also bypass PML/BML/BTL to eliminate protocol and software > overhead. Good. As Sylvain mentioned, that was the intent of the coll framework -- it certainly isn't *necessary* for coll's to always implement their underlying sends/receives with the BTL. The sm coll does this, for example -- it uses its own shared memory block for talking to other the sm coll's in other processes on the same node, but it doesn't go through the sm BTL. > To achieve above, we created 'tofu COMMON', like sm (ompi/mca/common/sm/). > > Is there interesting one? > > Though our BTL and COLL are quite interconnect-specific, LLP may be > contributed in the future. Yes, it may be interesting to see what you did there. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/