So with the help of IBM support and Venkat (thanks guys!), we think its a problem with DMAPI. As we initially saw this as an issue with AFM replication, we had traces from there, and had entries like:
gpfsWrite exit: failed err 688 Now apparently err 688 relates to "DMAPI disposition", once we had this we were able to get someone to take a look at the HSM dsmrecalld, it was running, but had failed over to a node that wasn't able to service requests properly. (multiple NSD servers with different file-systems each running dsmrecalld, but I don't think you can scope nods XYZ to filesystem ABC but not DEF). Anyway once we got that fixed, a bunch of stuff in the AFM cache popped out (and a little poke for some stuff that hadn't updated metadata cache probably). So hopefully its now also solved for our other users. What is complicated here is that a DMAPI issue was giving intermittent IO errors, people could write into new folders, but not existing files, though I could (some sort of Schrödinger's cat IO issue??). So hopefully we are fixed... Simon On 11/10/2017, 15:01, "gpfsug-discuss-boun...@spectrumscale.org on behalf of uwefa...@de.ibm.com" <gpfsug-discuss-boun...@spectrumscale.org on behalf of uwefa...@de.ibm.com> wrote: >Usually, IO errors point to some basic problem reading/writing data . >if there are repoducible errors, it's IMHO always a nice thing to trace >GPFS for such an access. Often that reveals already the area where the >cause lies and maybe even the details of it. > > > > >Mit freundlichen Grüßen / Kind regards > > >Dr. Uwe Falke > >IT Specialist >High Performance Computing Services / Integrated Technology Services / >Data Center Services >-------------------------------------------------------------------------- >----------------------------------------------------------------- >IBM Deutschland >Rathausstr. 7 >09111 Chemnitz >Phone: +49 371 6978 2165 >Mobile: +49 175 575 2877 >E-Mail: uwefa...@de.ibm.com >-------------------------------------------------------------------------- >----------------------------------------------------------------- >IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: >Thomas Wolter, Sven Schooß >Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, >HRB 17122 > > > > >From: "Simon Thompson (IT Research Support)" <s.j.thomp...@bham.ac.uk> >To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> >Date: 10/11/2017 01:22 PM >Subject: Re: [gpfsug-discuss] Checking a file-system for errors >Sent by: gpfsug-discuss-boun...@spectrumscale.org > > > >Yes I get we should only be doing this if we think we have a problem. > >And the answer is, right now, we're not entirely clear. > >We have a couple of issues our users are reporting to us, and its not >clear to us if they are related, an FS problem or ACLs getting in the way. > >We do have users who are trying to work on files getting IO error, and we >have an AFM sync issue. The disks are all online, I poked the FS with >tsdbfs and the files look OK - (small files, but content of the block >matches). > >Maybe we have a problem with DMAPI and TSM/HSM (could that cause IO error >reported to user when they access a file even if its not an offline >file??) > >We have a PMR open with IBM on this already. > >But there's a wanting to be sure in our own minds that we don't have an >underlying FS problem. I.e. I have confidence that I can tell my users, >yes I know you are seeing weird stuff, but we have run checks and are not >introducing data corruption. > >Simon > >On 11/10/2017, 11:58, "gpfsug-discuss-boun...@spectrumscale.org on behalf >of uwefa...@de.ibm.com" <gpfsug-discuss-boun...@spectrumscale.org on >behalf of uwefa...@de.ibm.com> wrote: > >>Mostly, however, filesystem checks are only done if fs issues are >>indicated by errors in the logs. Do you have reason to assume your fs has >>probs? > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > >_______________________________________________ >gpfsug-discuss mailing list >gpfsug-discuss at spectrumscale.org >http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss