hi sven, > before i answer the rest of your questions, can you share what version of > GPFS exactly you are on mmfsadm dump version would be best source for that. it returns Build branch "4.2.3.3 ".
> if you have 2 inodes and you know the exact address of where they are > stored on disk one could 'dd' them of the disk and compare if they are > really equal. ok, i can try that later. are you suggesting that the "tsdbfs comp" might gave wrong results? because we ran that and got eg > # tsdbfs somefs comp 7:5137408 25:221785088 1024 > Comparing 1024 sectors at 7:5137408 = 0x7:4E6400 and 25:221785088 = > 0x19:D382C00: > All sectors identical > we only support checksums when you use GNR based systems, they cover > network as well as Disk side for that. > the nsdchecksum code you refer to is the one i mentioned above thats only > supported with GNR at least i am not aware that we ever claimed it to be > supported outside of it, but i can check that. ok, maybe i'm a bit consfused. we have a GNR too, but it's not this one, and they are not in the same gpfs cluster. i thought the GNR extended the checksumming to disk, and that it was already there for the network part. thanks for clearing this up. but that is worse then i thought... stijn > > sven > > On Wed, Aug 2, 2017 at 12:20 PM Stijn De Weirdt <stijn.dewei...@ugent.be> > wrote: > >> hi sven, >> >> the data is not corrupted. mmfsck compares 2 inodes, says they don't >> match, but checking the data with tbdbfs reveals they are equal. >> (one replica has to be fetched over the network; the nsds cannot access >> all disks) >> >> with some nsdChksum... settings we get during this mmfsck a lot of >> "Encountered XYZ checksum errors on network I/O to NSD Client disk" >> >> ibm support says these are hardware issues, but wrt to mmfsck false >> positives. >> >> anyway, our current question is: if these are hardware issues, is there >> anything in gpfs client->nsd (on the network side) that would detect >> such errors. ie can we trust the data (and metadata). >> i was under the impression that client to disk is not covered, but i >> assumed that at least client to nsd (the network part) was checksummed. >> >> stijn >> >> >> On 08/02/2017 09:10 PM, Sven Oehme wrote: >>> ok, i think i understand now, the data was already corrupted. the config >>> change i proposed only prevents a potentially known future on the wire >>> corruption, this will not fix something that made it to the disk already. >>> >>> Sven >>> >>> >>> >>> On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <stijn.dewei...@ugent.be >>> >>> wrote: >>> >>>> yes ;) >>>> >>>> the system is in preproduction, so nothing that can't stopped/started in >>>> a few minutes (current setup has only 4 nsds, and no clients). >>>> mmfsck triggers the errors very early during inode replica compare. >>>> >>>> >>>> stijn >>>> >>>> On 08/02/2017 08:47 PM, Sven Oehme wrote: >>>>> How can you reproduce this so quick ? >>>>> Did you restart all daemons after that ? >>>>> >>>>> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <stijn.dewei...@ugent.be >>> >>>>> wrote: >>>>> >>>>>> hi sven, >>>>>> >>>>>> >>>>>>> the very first thing you should check is if you have this setting >> set : >>>>>> maybe the very first thing to check should be the faq/wiki that has >> this >>>>>> documented? >>>>>> >>>>>>> >>>>>>> mmlsconfig envVar >>>>>>> >>>>>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1 >>>>>>> MLX5_USE_MUTEX 1 >>>>>>> >>>>>>> if that doesn't come back the way above you need to set it : >>>>>>> >>>>>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1 >>>>>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1" >>>>>> i just set this (wasn't set before), but problem is still present. >>>>>> >>>>>>> >>>>>>> there was a problem in the Mellanox FW in various versions that was >>>> never >>>>>>> completely addressed (bugs where found and fixed, but it was never >>>> fully >>>>>>> proven to be addressed) the above environment variables turn code on >> in >>>>>> the >>>>>>> mellanox driver that prevents this potential code path from being >> used >>>> to >>>>>>> begin with. >>>>>>> >>>>>>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in >>>> Scale >>>>>>> that even you don't set this variables the problem can't happen >> anymore >>>>>>> until then the only choice you have is the envVar above (which btw >>>> ships >>>>>> as >>>>>>> default on all ESS systems). >>>>>>> >>>>>>> you also should be on the latest available Mellanox FW & Drivers as >> not >>>>>> all >>>>>>> versions even have the code that is activated by the environment >>>>>> variables >>>>>>> above, i think at a minimum you need to be at 3.4 but i don't >> remember >>>>>> the >>>>>>> exact version. There had been multiple defects opened around this >> area, >>>>>> the >>>>>>> last one i remember was : >>>>>> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from >>>>>> dell, and the fw is a bit behind. i'm trying to convince dell to make >>>>>> new one. mellanox used to allow to make your own, but they don't >>>> anymore. >>>>>> >>>>>>> >>>>>>> 00154843 : ESS ConnectX-3 performance issue - spinning on >>>>>> pthread_spin_lock >>>>>>> >>>>>>> you may ask your mellanox representative if they can get you access >> to >>>>>> this >>>>>>> defect. while it was found on ESS , means on PPC64 and with >> ConnectX-3 >>>>>>> cards its a general issue that affects all cards and on intel as well >>>> as >>>>>>> Power. >>>>>> ok, thanks for this. maybe such a reference is enough for dell to >> update >>>>>> their firmware. >>>>>> >>>>>> stijn >>>>>> >>>>>>> >>>>>>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt < >>>> stijn.dewei...@ugent.be> >>>>>>> wrote: >>>>>>> >>>>>>>> hi all, >>>>>>>> >>>>>>>> is there any documentation wrt data integrity in spectrum scale: >>>>>>>> assuming a crappy network, does gpfs garantee somehow that data >>>> written >>>>>>>> by client ends up safe in the nsd gpfs daemon; and similarly from >> the >>>>>>>> nsd gpfs daemon to disk. >>>>>>>> >>>>>>>> and wrt crappy network, what about rdma on crappy network? is it the >>>>>> same? >>>>>>>> >>>>>>>> (we are hunting down a crappy infiniband issue; ibm support says >> it's >>>>>>>> network issue; and we see no errors anywhere...) >>>>>>>> >>>>>>>> thanks a lot, >>>>>>>> >>>>>>>> stijn >>>>>>>> _______________________________________________ >>>>>>>> gpfsug-discuss mailing list >>>>>>>> gpfsug-discuss at spectrumscale.org >>>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> gpfsug-discuss mailing list >>>>>>> gpfsug-discuss at spectrumscale.org >>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>>> >>>>>> _______________________________________________ >>>>>> gpfsug-discuss mailing list >>>>>> gpfsug-discuss at spectrumscale.org >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> >>> >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss