Le 08/09/2010 14:02, Jeff Squyres a écrit : > On Sep 3, 2010, at 3:38 PM, George Bosilca wrote: > > >> However, going over the existing BTLs I can see that some BTLs do not >> correctly set this value: >> >> BTL Bandwidth Auto-detect Status >> Elan 2000 NO Correct >> GM 250 NO Doubtful >> MX 2000/10000 YES (Mbs) Correct (before the patch) >> OFUD 800 NO Doubtful >> OpenIB 2000/4000/8000 YES (Mbs) Correct (multiplied by the >> active_width) >> Portals 1000 NO Doubtful >> SCTP 100 NO Conservative value (correct) >> Self 100 XXX Correct (doesn't matter anyway) >> SM 9000 NO Correct >> TCP 100 NO Conservative value (correct) >> UDAPL 225 NO Incorrect >> > Now that that patch has been rolled back out, did we come to conclusion here? > > - OFUD: why do we still even have this? > - Portals: does it matter if it gets it wrong? No one will ever multi-rail > with it. > - TCP: we can add auto-detect code for this (But doesn't have to be right > away -- i.e., don't make 1.5.0 wait for it). > - UDAPL: I don't think anyone will multi-rail udapl with anything. > > Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected > incorrectly somehow? >
The first problem came from IB not autodetecting at all by default and using 800Mbit/s instead. When forcing autodetect with mca parameters, the bandwidth are not perfect but not too bad. When forcing IB manually to the right bandwidth value, I can tweak things as needed. Brice
