FYI, About six months ago several of us spent some time coming up with a plan to deal with the latency problems in Open MPI. George went ahead and has been implementing the send side changes of this optimization over the last several months, but has not had time to get to the receive side. Galen is picking up on this, and will be checking in changes over the next several weeks. The gist of these is going from an active-message tag approach with one tag per protocol (ptp, one-sided, etc) to an 8 bit global tag space, and finer grain functions (short message, rendezvous packet, ...), as well as some function consolidation.
Rich