Hmmmm...since I have no control nor involvement in what gets sent, perhaps I can be a disinterested third party. ;-)
Could you perhaps explain this comment: > BTW I looked at how we do modex now on the trunk. For OOB case more > than half the data we send for each proc is garbage. What "garbage" are you referring to? I am working to remove the stuff inserted by proc.c - mostly hostname, hopefully arch, etc. If you are running a "debug" version, there will also be type descriptors for each entry, but those are eliminated for optimized builds. So are you referring to other things? Thanks Ralph On 4/3/08 6:52 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > On Wed, Apr 02, 2008 at 08:41:14PM -0400, Jeff Squyres wrote: >>>> that it's the same for all procs on all hosts. I guess there's a few >>>> cases: >>>> >>>> 1. homogeneous include/exclude, no carto: send all in node info; no >>>> proc info >>>> 2. homogeneous include/exclude, carto is used: send all ports in node >>>> info; send index in proc info for which node info port index it >>>> will use >>> This may actually increase modex size. Think about two procs using two >>> different hcas. We'll send all the data we send today + indexes. >> >> It'll increase it compared to the optimization that we're about to >> make. But it will certainly be a large decrease compared to what >> we're doing today > > May be I don't understand something in what you propose then. Currently > when I run two procs on the same node and each proc uses different HCA > each one of them sends message that describes the HCA in use by the > proc. The message is of the form <mtu, subnet, lid, apm_lid, cpc>. > Each proc send one of those so there are two message total on the wire. > You propose that one of them should send description of both > available ports (that is one of them sends two messages of the form > above) and then each proc send additional message with the index of the > HCA that it is going to use. And this is more data on the wire after > proposed optimization than we have now. > > >> (see the spreadsheet that I sent last week). > I've looked at it but I could not decipher it :( I don't understand > where all these numbers a come from. > >> >> Indeed, we can even put in the optimization that if there's only one >> process on a host, it can only publish the ports that it will use (and >> therefore there's no need for the proc data). > More special cases :( > >> >>>> 3. heterogeneous include/exclude, no cart: need user to tell us that >>>> this situation exists (e.g., use another MCA param), but then is same >>>> as #2 >>>> 4. heterogeneous include/exclude, cart is used, same as #3 >>>> >>>> Right? >>>> >>> Looks like it. FWIW I don't like the idea to code all those special >>> cases. The way it works now I can be pretty sure that any crazy setup >>> I'll come up with will work. >> >> And so it will with the new scheme. The only place it won't work is >> if the user specifies a heterogeneous include/exclude (i.e., we'll >> require that the user tells us when they do that), which nobody does. >> >> I guess I don't see the problem...? > I like things to be simple. KISS principle I guess. And I do care about > heterogeneous include/exclude too. > > BTW I looked at how we do modex now on the trunk. For OOB case more > than half the data we send for each proc is garbage. > >> >>> By the way how much data are moved during modex stage? What if modex >>> will use compression? >> >> >> The spreadsheet I listed was just the openib part of the modex, and it >> was fairly hefty. I have no idea how well (or not) it would compress. >> > I looked at what kind of data we send during openib modex and I created > file with 10000 openib modex messages. mtu, subnet id and cpc list where > the same in each message but lid/apm_lid where different, this is > pretty close approximation of the data that is sent from HN to each > process. The uncompressed file size is 489K compressed file size is 43K. > More then 10 times smaller. > > -- > Gleb. > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel