Hi there, Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup?
Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)? Sorry if I am un-announced here for the first time. But I would like to help if I can. Jose Higino, from NIWA New Zealand Cheers On Sun, 22 Jul 2018 at 23:26, Peter Childs <[email protected]> wrote: > Yes, we run mmbackup, using a snapshot. > > The scan usally takes an hour, but for the last week has been taking many > hours (i saw it take 12 last Tuesday) > > It's speeded up again now back to its normal hour, but the high io jobs > accessing the same file from many nodes also look to have come to an end > for the time being. > > I was trying to figure out howto control the bad io using mmchqos, to > prioritise certain nodes over others but had not worked out if that was > possible yet. > > We've only previously seen this problem when we had some bad disks in our > storage, which we replaced, I've checked and I can't see that issue > currently. > > Thanks for the help. > > > > Peter Childs > Research Storage > ITS Research and Teaching Support > Queen Mary, University of London > > ---- Yaron Daniel wrote ---- > > Hi > > Do u run mmbackup on snapshot , which is read only ? > > > Regards > > ------------------------------ > > > > *Yaron Daniel* 94 Em Ha'Moshavot Rd > *Storage Architect – IL Lab Services (Storage)* Petach Tiqva, 49527 > *IBM Global Markets, Systems HW Sales* Israel > > Phone: +972-3-916-5672 > Fax: +972-3-916-5672 > Mobile: +972-52-8395593 > e-mail: [email protected] > *IBM Israel* <http://www.ibm.com/il/he/> > > > > > [image: IBM Storage Strategy and Solutions v1][image: IBM Storage > Management and Data Protection v1] [image: > https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png] > [image: Related image] > > > > From: Peter Childs <[email protected]> > To: "[email protected]" < > [email protected]> > Date: 07/10/2018 05:51 PM > Subject: [gpfsug-discuss] Same file opened by many nodes / > processes > Sent by: [email protected] > ------------------------------ > > > > We have an situation where the same file is being read by around 5000 > "jobs" this is an array job in uge with a tc set, so the file in > question is being opened by about 100 processes/jobs at the same time. > > Its a ~200GB file so copying the file locally first is not an easy > answer, and these jobs are causing issues with mmbackup scanning the > file system, in that the scan is taking 3 hours instead of the normal > 40-60 minutes. > > This is read only access to the file, I don't know the specifics about > the job. > > It looks like the metanode is moving around a fair amount (given what I > can see from mmfsadm saferdump file) > > I'm wondering if we there is anything we can do to improve things or > that can be tuned within GPFS, I'm don't think we have an issue with > token management, but would increasing maxFileToCache on our token > manager node help say? > > Is there anything else I should look at, to try and attempt to allow > GPFS to share this file better. > > Thanks in advance > > Peter Childs > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
