I was making basic performance measurements on our machine after installing 1.8.5, the performance were looking bad. It turns out that the smcuda btl has a higher exclusivity than both vader and sm, even on machines with no nvidia adapters. Is there a strong reason why the default exclusivity is set so high ? Of course it can be easily fixed with a couple of mca options, but unsuspecting users that “just run” will experience 1/3 overhead across the board for shared memory communication according to my measurements.
Side note: from my understanding of the smcuda component, performance should be identical to the regular sm component (as long as no GPU operation are required). This is not the case, there is some performance penalty with smcuda compared to sm. Aurelien -- Aurélien Bouteiller ~~ https://icl.cs.utk.edu/~bouteill/
signature.asc
Description: Message signed with OpenPGP using GPGMail