Hey Sasha, Attached are some numbers from a recent run I did with my port offsetting patches. I ran w/ mvapich 0.9.9 and OpenMPI 1.2.6 on 120 nodes. I ran w/ 1 task per node or 8 tasks per node (nodes have 8 processors each), trying LMC=0, LMC=1, and LMC=2 with the original 'updn', then LMC=1 and LMC=2 with my port-offsetting patch (labeled "PO"). Next to these columns are the percentage worse the numbers are in comparison to LMC=0. My understanding is that mvapich 0.9.9 does not know how to take advantage of multiple lids while openMPI 1.2.6 does know how to take advantage of it.
I think the key numbers to notice are that without port-offsetting, performance relative to LMC=0 is pretty bad when the MPI implementation does not know how to take advantage of multiple lids (mvapich 0.9.9). LMC=1 shows ~30% performance degradation and LMC=2 shows ~90% degradation on this cluster. With the port-offsetting turned on, the degradation falls to 0%-6%, a few times even being faster. We consider this within "noise" levels. For MPIs that do know how to take advantage of multiple lids it seems that the port-offsetting patch doesn't affect performance that much. (See OpenMPI 1.2.6 sections). PLMK what you think. Thanks. Al On Thu, 2008-04-10 at 14:10 -0700, Al Chu wrote: > Hey Sasha, > > I was going to submit this after I had a chance to test on one of our > big clusters to see if it worked 100% right. But my final testing has > been delayed (for a month now!). Ira said some folks from Sonoma were > interested in this, so I'll go ahead and post it. > > This is a patch for something I call "port_offsetting" (name/description > of the option is open to suggestion). Basically, we want to move to > using lmc > 0 on our clusters b/c some of the newer MPI implementations > take advantage of multiple lids and have shown faster performance when > lmc > 0. > > The problem is that those users that do not use the newer MPI > implementations, or do not run their code in a way that can take > advantage of multiple lids, suffer great performance degradation in > their code. We determined that the primary issue is what we started > calling "base lid alignment". Here's a simple example. > > Assume LMC = 2 and we are trying to route the lids of 4 ports (A,B,C,D). > Those lids are: > > port A - 1,2,3,4 > port B - 5,6,7,8 > port C - 9,10,11,12 > port D - 13,14,15,16 > > Suppose forwarding of these lids goes through 4 switch ports. If we > cycle through the ports like updn/minhop currently do, we would see > something like this. > > switch port 1: 1, 5, 9, 13 > switch port 2: 2, 6, 10, 14 > switch port 3: 3, 7, 11, 15 > switch port 4: 4, 8, 12, 16 > > Note that the base lid of each port (lids 1, 5, 9, 13) goes through only > 1 port of the switch. Thus a user that uses only the base lid is using > only 1 port out of the 4 ports they could be using. Leading to terrible > performance. > > We want to get this instead. > > switch port 1: 1, 8, 11, 14 > switch port 2: 2, 5, 12, 15 > switch port 3: 3, 6, 9, 16 > switch port 4: 4, 7, 10, 13 > > where base lids are distributed in a more even manner. > > In order to do this, we (effectively) iterate through all ports like > before, but we iterate starting at a different index depending on the > number of paths we have routed thus far. > > On one of our clusters, some testing has shown when we run w/ LMC=1 and > 1 task per node, mpibench (AlltoAll tests) range from 10-30% worse than > when LMC=0 is used. With LMC=2, mpibench tends to be 50-70% worse in > performance than with LMC=0. > > With the port offsetting option, the performance degradation ranges 1-5% > worse than LMC=0. I am currently at a loss why I cannot get it to be > even to LMC=0, but 1-5% is small enough to not make users mad :-) > > The part I haven't been able to test yet is whether newer MPIs that do > take advantage of LMC > 0 run equally when my port_offsetting is turned > off and on. That's the part I'm still haven't been able to test. > > Thanks, look forward to your comments, > > Al > > -- Albert Chu [EMAIL PROTECTED] 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory
mpi_port_offsetting.xls
Description: MS-Excel spreadsheet
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
