How can you reproduce this so quick ? Did you restart all daemons after that ?
On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <[email protected]> wrote: > hi sven, > > > > the very first thing you should check is if you have this setting set : > maybe the very first thing to check should be the faq/wiki that has this > documented? > > > > > mmlsconfig envVar > > > > envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1 > > MLX5_USE_MUTEX 1 > > > > if that doesn't come back the way above you need to set it : > > > > mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1 > > MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1" > i just set this (wasn't set before), but problem is still present. > > > > > there was a problem in the Mellanox FW in various versions that was never > > completely addressed (bugs where found and fixed, but it was never fully > > proven to be addressed) the above environment variables turn code on in > the > > mellanox driver that prevents this potential code path from being used to > > begin with. > > > > in Spectrum Scale 4.2.4 (not yet released) we added a workaround in Scale > > that even you don't set this variables the problem can't happen anymore > > until then the only choice you have is the envVar above (which btw ships > as > > default on all ESS systems). > > > > you also should be on the latest available Mellanox FW & Drivers as not > all > > versions even have the code that is activated by the environment > variables > > above, i think at a minimum you need to be at 3.4 but i don't remember > the > > exact version. There had been multiple defects opened around this area, > the > > last one i remember was : > we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from > dell, and the fw is a bit behind. i'm trying to convince dell to make > new one. mellanox used to allow to make your own, but they don't anymore. > > > > > 00154843 : ESS ConnectX-3 performance issue - spinning on > pthread_spin_lock > > > > you may ask your mellanox representative if they can get you access to > this > > defect. while it was found on ESS , means on PPC64 and with ConnectX-3 > > cards its a general issue that affects all cards and on intel as well as > > Power. > ok, thanks for this. maybe such a reference is enough for dell to update > their firmware. > > stijn > > > > > On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <[email protected]> > > wrote: > > > >> hi all, > >> > >> is there any documentation wrt data integrity in spectrum scale: > >> assuming a crappy network, does gpfs garantee somehow that data written > >> by client ends up safe in the nsd gpfs daemon; and similarly from the > >> nsd gpfs daemon to disk. > >> > >> and wrt crappy network, what about rdma on crappy network? is it the > same? > >> > >> (we are hunting down a crappy infiniband issue; ibm support says it's > >> network issue; and we see no errors anywhere...) > >> > >> thanks a lot, > >> > >> stijn > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
