Sasha Khapyorsky a écrit : > Hi Nicolas, > > On 11:12 Thu 19 Mar , Nicolas Morey Chaisemartin wrote: >> First question and an easy one: >> By default with what optimization options is OpenSM compiled with? > > It is defined in autoconf I guess. > >> Without any specific options using git, there a no -O2 or such so all the >> inline functions are not inline which make a huge bad impact on performances >> (few millions calls to osm_switch_get_least_hops not inline consumes over >> 15% of computing time) > > In my setup I have in /usr/share/autoconf/autoconf/c.m4: > > if test "$GCC" = yes; then > CFLAGS="-g -O2" > > And -O2 is turned on by default.
This doesn't seem to be the case for me, I'll check where that comes from > >> Next one and a bit harder: >> In the Fat-Tree we have a 2D array for hop table (destination lid/port num). >> Why is this table allocated as we need and not all at once? > > Yes, I think it can be preallocated. Just not that this should not have > port arrays for each lid in a fabric, but only for switches. Good. There should be one table for each switch and each of them as one entry per (port/lid) couple no? >> And is it really necessary to check each time if the lid we use is not >> greater than max_lid_ho ? > > May be not, but I will need to check carefully. > >> The only reason I would see for this is if a new node/switch with a bigger >> lid was added to the fabric while openSM is routing. In such a case, >> wouldn't a lock protect the variables so new lid can't appear/disappear >> while it calculates the routes ? > > Routing calculation cannot happen in parallel with discovery (it is all > serialized in do_sweep() function), we should be protected at least in > this part. > >> If yes, we could allocate all and skip a lot of checks. We have millions of >> calls to malloc and memset in osm_switch_set_hops plus tests in >> get_hops/get_least_hops. > > malloc() calls are conditional, there could be many checks, but only > "needed" amount of malloc() itself. > We don't do more than necessary but it costs a lot of time to do millions of syscall. >> This may cost a bit more memory, > > Min hop's port arrays preallocation should not cost any extra memory (if > done properly - for switches only) - we are allocating all needed buffers > in routing calculation time anyway. > >> but easily gain 15% on routing computation time. > > Well, I'm skeptical about 15% :). But it doesn't really matter - even > 1% performance gain and/or cleaner code would be nice improvement. > Well it about 15% of computation time (for Ftree at least) is spent in set_hops and specifically in malloc/memset part If we do it only once at the start, it should probably be much faster ! By the way, is there a reason you don't use likely/unlikely commands in conditions? Regards Nicolas _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
