On 4/29/09 16:07 , "Woodruff, Robert J" <[email protected]> wrote:
> Brian wrote, > >> And Open Fabrics is the only "commodity" interfaces that makes implementers >> go through these pains. Myrinet's MX, Cray's Portals, and Quadric's Tports >> all handle the issues either at the driver library or kernel module level. > > One important note is that in general, Myrinet, Quadrics, and even Portals > were designed to primarily to run MPI, so it is not a surprise that their > interfaces > map almost 1:1 to the MPI interfaces. Also, note that all of these use a > tag-matching > capability, which also seems to map well to MPI. > RDMA/OFA verbs were designed to be a more general interface to > support lots of ULPs, networking (tcp/ip), storage, etc, not just MPI. True, although any of those could be extended to support the features necessary for storage and such (and many already support IP). The code complexity claim is also true of sockets (TCP, in particular). It's a lot less code and doesn't make us jump through nearly as many hoops. Obviously it doesn't perform as well, but 5-6x the code complexity for OFED isn't a good thing. > That said, for hardware that does support these tag-matching capabilities, > like > myrinet, Qlogic's HCA (i.e. PSM), OpenMX, and even quadrix, maybe OFA should > have a > generic tag-matching set of verbs that the MPIs could use instead of the > RDMA verbs. The IHVs, like Qlogic, MX, and others that support tag-matching > could > plug into this generic tag-matching infrastructure. The MPIs would then only > have to > write one driver in MPI to support all these different IHVs that support > tag-matching, > and that MPI driver would be a very simple one, since the tag-matching verbs > would map almost 1:1 to the MPI interfaces, like MX or PSM do. I think there are other problems with the verbs interface that would still make MPI implementers twitch (some of which are in the slides Jeff sent out to begin this discussion). But I certainly wouldn't say no to a real set of tag matching primitives. Of course, that opens a whole can of worms that I'm not sure OFED is ready to deal with. It also may or may not solve the memory registration problem. If the memory in the matching verb still had to be registered, we haven't solved the problem that started this discussion. So the verb would have to also handle memory registration, which seems to go against the general "OFA way". > Heck, maybe we should even encourage the IBTA and iWARP associations to add > tag-matching > as a feature to the next version of the IBTA and iWARP specs. If they did > that, > it would make the MPI implementers life a lot easier. I would rather see that > done, > then hack thousands of lines of memory registration caching code and stuff it > into the > kernel. I would love matching in the spec. But I'm not sure it directly solves any of the problems Jeff brought up in his talk at Sonoma. I can cope with having to do matching in the MPI (I'm going to have that code anyway for TCP networks). But it's the connection management, the memory pinning, and the receive buffer space requirements that really drive us nuts and require the bulk of our effort. Brian -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
