ok, i think i understand now, the data was already corrupted. the config change i proposed only prevents a potentially known future on the wire corruption, this will not fix something that made it to the disk already.
Sven On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <stijn.dewei...@ugent.be> wrote: > yes ;) > > the system is in preproduction, so nothing that can't stopped/started in > a few minutes (current setup has only 4 nsds, and no clients). > mmfsck triggers the errors very early during inode replica compare. > > > stijn > > On 08/02/2017 08:47 PM, Sven Oehme wrote: > > How can you reproduce this so quick ? > > Did you restart all daemons after that ? > > > > On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <stijn.dewei...@ugent.be> > > wrote: > > > >> hi sven, > >> > >> > >>> the very first thing you should check is if you have this setting set : > >> maybe the very first thing to check should be the faq/wiki that has this > >> documented? > >> > >>> > >>> mmlsconfig envVar > >>> > >>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1 > >>> MLX5_USE_MUTEX 1 > >>> > >>> if that doesn't come back the way above you need to set it : > >>> > >>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1 > >>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1" > >> i just set this (wasn't set before), but problem is still present. > >> > >>> > >>> there was a problem in the Mellanox FW in various versions that was > never > >>> completely addressed (bugs where found and fixed, but it was never > fully > >>> proven to be addressed) the above environment variables turn code on in > >> the > >>> mellanox driver that prevents this potential code path from being used > to > >>> begin with. > >>> > >>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in > Scale > >>> that even you don't set this variables the problem can't happen anymore > >>> until then the only choice you have is the envVar above (which btw > ships > >> as > >>> default on all ESS systems). > >>> > >>> you also should be on the latest available Mellanox FW & Drivers as not > >> all > >>> versions even have the code that is activated by the environment > >> variables > >>> above, i think at a minimum you need to be at 3.4 but i don't remember > >> the > >>> exact version. There had been multiple defects opened around this area, > >> the > >>> last one i remember was : > >> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from > >> dell, and the fw is a bit behind. i'm trying to convince dell to make > >> new one. mellanox used to allow to make your own, but they don't > anymore. > >> > >>> > >>> 00154843 : ESS ConnectX-3 performance issue - spinning on > >> pthread_spin_lock > >>> > >>> you may ask your mellanox representative if they can get you access to > >> this > >>> defect. while it was found on ESS , means on PPC64 and with ConnectX-3 > >>> cards its a general issue that affects all cards and on intel as well > as > >>> Power. > >> ok, thanks for this. maybe such a reference is enough for dell to update > >> their firmware. > >> > >> stijn > >> > >>> > >>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt < > stijn.dewei...@ugent.be> > >>> wrote: > >>> > >>>> hi all, > >>>> > >>>> is there any documentation wrt data integrity in spectrum scale: > >>>> assuming a crappy network, does gpfs garantee somehow that data > written > >>>> by client ends up safe in the nsd gpfs daemon; and similarly from the > >>>> nsd gpfs daemon to disk. > >>>> > >>>> and wrt crappy network, what about rdma on crappy network? is it the > >> same? > >>>> > >>>> (we are hunting down a crappy infiniband issue; ibm support says it's > >>>> network issue; and we see no errors anywhere...) > >>>> > >>>> thanks a lot, > >>>> > >>>> stijn > >>>> _______________________________________________ > >>>> gpfsug-discuss mailing list > >>>> gpfsug-discuss at spectrumscale.org > >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >>>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> gpfsug-discuss mailing list > >>> gpfsug-discuss at spectrumscale.org > >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >>> > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >> > > > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss