On Monday 12 May 2008 07:37:54 pm Jeff Squyres wrote: > Short version: > -------------- > > I propose that we should disallow multiple different > mca_btl_openib_receive_queues values (or receive_queues values from > the INI file) to be used in a single MPI job for the v1.3 series. > > More details: > ------------- > > The reason I'm looking into this heterogeneity stuff is to help > Chelsio support their iWARP NIC in OMPI. Their NIC needs a specific > value for mca_btl_openib_receive_queues (specifically: it does not > support SRQ and it has the wireup race condition that we discussed > before). > > The major problem is that all the BSRQ information is currently stored > in on the openib component -- it is *not* maintained on a per-HCA (or > per port) basis. We *could* move all the BSRQ info to live on the > hca_t struct (or even the openib module struct), but it has at least 3 > big consequences: > > 1. It would touch a lot of code. But touching all this code is > relatively low risk; it will be easy to check for correctness because > the changes will either compile or not. > > 2. There are functions (some of which are static inline) that read the > BSRQ data. These functions would have to take an additional (hca_t*) > (or (btl_openib_module_t*)) parameter. > > 3. Getting to the BSRQ info will take at least 1 or 2 more > dereferences (e.g., module->hca->bsrq_info or module->bsrq_info...). > > I'm not too concerned about #1 (it's grunt work), but I am a bit > concerned about #2 and #3 since at least some of these places are in > the critical performance path. > > Given these concerns, I propose the following v1.3: > > - Add a "receive_queues" field to the INI file so that the Chelsio > adapter can run out of the box (i.e., "mpirun -np 4 a.out" with hosts > containing Chelsio NICs will get a value for btl_openib_receive_queues > that will work). > > - NetEffect NICs will also require overriding > btl_openib_receive_queues, but will likely have a different value than > Chelsio NICs (they don't have the wireup race condition that Chelsio > does). > > - Because the BSRQ info is on the component (i.e., global), we should > detect when multiple different receive_queues values are specified and > gracefully abort.
How would we verify that the remote receive_queues values are the same? By passing around the receive_queues values in the modex (which I thought we were trying to reduce) or would we pass this around during cpc setup (for those that can support this)? > I think it'll be quite uncommon to have a need for two different > receive_queues values, and that this proposal will be fine for v1.3 > > Comments? Sounds reasonable to me. > On May 12, 2008, at 6:44 PM, Jeff Squyres wrote: > > After looking at the code a bit, I realized that I completely forgot > > that the INI file was invented to solve at least the heterogeneous- > > adapters-in-a-host problem. > > > > So I amended the ticket to reflect that that problem is already > > solved. The other part is not, though -- consider two MPI procs on > > different hosts, each with an iWARP NIC, but one NIC supports SRQs and > > one does not. > > > > On May 12, 2008, at 5:36 PM, Jeff Squyres wrote: > >> I think that this issue has come up before, but I filed a ticket > >> about it because at least one developer (Jon) has a system with both > >> IB and iWARP adapters: > >> > >> https://svn.open-mpi.org/trac/ompi/ticket/1282 > >> > >> My question: do we care about the heterogeneous adapter scenarios? > >> For v1.3? For v1.4? For ...some version in the future? > >> > >> I think the first issue I identified in the ticket is grunt work to > >> fix (annoying and tedious, but not difficult), but the second one > >> will be a little dicey -- it has scalability issues (e.g., sending > >> around all info in the modex, etc.). > >> > >> -- > >> Jeff Squyres > >> Cisco Systems > > > > -- > > Jeff Squyres > > Cisco Systems > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel