hi sven, the data is not corrupted. mmfsck compares 2 inodes, says they don't match, but checking the data with tbdbfs reveals they are equal. (one replica has to be fetched over the network; the nsds cannot access all disks)
with some nsdChksum... settings we get during this mmfsck a lot of "Encountered XYZ checksum errors on network I/O to NSD Client disk" ibm support says these are hardware issues, but wrt to mmfsck false positives. anyway, our current question is: if these are hardware issues, is there anything in gpfs client->nsd (on the network side) that would detect such errors. ie can we trust the data (and metadata). i was under the impression that client to disk is not covered, but i assumed that at least client to nsd (the network part) was checksummed. stijn On 08/02/2017 09:10 PM, Sven Oehme wrote: > ok, i think i understand now, the data was already corrupted. the config > change i proposed only prevents a potentially known future on the wire > corruption, this will not fix something that made it to the disk already. > > Sven > > > > On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <[email protected]> > wrote: > >> yes ;) >> >> the system is in preproduction, so nothing that can't stopped/started in >> a few minutes (current setup has only 4 nsds, and no clients). >> mmfsck triggers the errors very early during inode replica compare. >> >> >> stijn >> >> On 08/02/2017 08:47 PM, Sven Oehme wrote: >>> How can you reproduce this so quick ? >>> Did you restart all daemons after that ? >>> >>> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <[email protected]> >>> wrote: >>> >>>> hi sven, >>>> >>>> >>>>> the very first thing you should check is if you have this setting set : >>>> maybe the very first thing to check should be the faq/wiki that has this >>>> documented? >>>> >>>>> >>>>> mmlsconfig envVar >>>>> >>>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1 >>>>> MLX5_USE_MUTEX 1 >>>>> >>>>> if that doesn't come back the way above you need to set it : >>>>> >>>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1 >>>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1" >>>> i just set this (wasn't set before), but problem is still present. >>>> >>>>> >>>>> there was a problem in the Mellanox FW in various versions that was >> never >>>>> completely addressed (bugs where found and fixed, but it was never >> fully >>>>> proven to be addressed) the above environment variables turn code on in >>>> the >>>>> mellanox driver that prevents this potential code path from being used >> to >>>>> begin with. >>>>> >>>>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in >> Scale >>>>> that even you don't set this variables the problem can't happen anymore >>>>> until then the only choice you have is the envVar above (which btw >> ships >>>> as >>>>> default on all ESS systems). >>>>> >>>>> you also should be on the latest available Mellanox FW & Drivers as not >>>> all >>>>> versions even have the code that is activated by the environment >>>> variables >>>>> above, i think at a minimum you need to be at 3.4 but i don't remember >>>> the >>>>> exact version. There had been multiple defects opened around this area, >>>> the >>>>> last one i remember was : >>>> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from >>>> dell, and the fw is a bit behind. i'm trying to convince dell to make >>>> new one. mellanox used to allow to make your own, but they don't >> anymore. >>>> >>>>> >>>>> 00154843 : ESS ConnectX-3 performance issue - spinning on >>>> pthread_spin_lock >>>>> >>>>> you may ask your mellanox representative if they can get you access to >>>> this >>>>> defect. while it was found on ESS , means on PPC64 and with ConnectX-3 >>>>> cards its a general issue that affects all cards and on intel as well >> as >>>>> Power. >>>> ok, thanks for this. maybe such a reference is enough for dell to update >>>> their firmware. >>>> >>>> stijn >>>> >>>>> >>>>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt < >> [email protected]> >>>>> wrote: >>>>> >>>>>> hi all, >>>>>> >>>>>> is there any documentation wrt data integrity in spectrum scale: >>>>>> assuming a crappy network, does gpfs garantee somehow that data >> written >>>>>> by client ends up safe in the nsd gpfs daemon; and similarly from the >>>>>> nsd gpfs daemon to disk. >>>>>> >>>>>> and wrt crappy network, what about rdma on crappy network? is it the >>>> same? >>>>>> >>>>>> (we are hunting down a crappy infiniband issue; ibm support says it's >>>>>> network issue; and we see no errors anywhere...) >>>>>> >>>>>> thanks a lot, >>>>>> >>>>>> stijn >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at spectrumscale.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
