Hi, The major change around 4.X in quotas was the introduction of dynamic shares. In the past, every client share request was for constant number of blocks ( 20 blocks by default). For high performing system, it wasn't enough sometime ( imagine 320M for nodes are writing at 20GB/s). So, dynamic shares means that a client node can request 10000 blocks etc. etc. ( it doesn't mean that the server will provide those...). OTOH, node failure will leave more "stale in doubt" capacity since the server don't know how much of the share was actually used.
Imagine a client node getting 1024 blocks ( 16G), using 20M and crashing. >From the server perspective, there are 16G "unknown", now multiple that by multiple nodes... The only way to solve it is indeed to execute mmcheckquota - but as you probably know, its not cheap. So, do you experience large number of node expels/crashes etc. that might be related to that ( otherwise, it might be some other bug that needs to be fixed...). Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: [email protected] 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Jaime Pinto <[email protected]> To: [email protected] Date: 07/10/2019 17:40 Subject: [EXTERNAL] Re: [gpfsug-discuss] Large in doubt on fileset Sent by: [email protected] We run DSS as well, also 4.2.x versions, and large indoubt entries are common on our file systems, much larger than what you are seeing, for USR, GRP and FILESET. It didn't use to be so bad on versions 3.4|3.5 in other IBM appliances (GSS, ESS), even DDN's or Cray G200. Under 4.x series the internal automatic mechanism to reconcile accounting seems very laggy by default, and I couldn't find (yet) a config parameter to adjust this. I stopped trying to understand why this happens. Our users are all subject to quotas, and can't wait indefinitely for this reconciliation. I just run mmcheckquota every 6 hours via a crontab. I hope version 5 is better. Will know in a couple of months. Jaime On 2019-10-07 10:07 a.m., Jonathan Buzzard wrote: > > I have a DSS-G system running 4.2.3-7, and on Friday afternoon became > aware that there is a very large (at least I have never seen anything > on this scale before) in doubt on a fileset. It has persisted over the > weekend and is sitting at 17.5TB, with the fileset having a 150TB quota > and only 82TB in use. > > There is a relatively large 26,500 files in doubt, though there is no > quotas on file numbers for the fileset. This has come down from some > 47,500 on Friday when the in doubt was a shade over 18TB. > > The largest in doubt I have seen in the past was in the order of a few > hundred GB under very heavy write that went away very quickly after the > writing stopped. > > There is no evidence of heavy writing going on in the file system so I > am perplexed as to why the in doubt is remaining so high. > > Any thoughts as to what might be going on? > > > JAB. > ************************************ TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=esG-w1Wj_wInSHpT5fEhqVQMqpR15ZXaGxoQmjOKdDc&e= ************************************ --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=mLPyKeOa1gNDrORvEXBgMw&m=2KzJ8YjgXm5NsAjcpquw6pMVJFbLUBZ-KEQb2oHFYqs&s=dxj6p74pt5iaKKn4KvMmMPyLcUD5C37HbIc2zX-iWgY&e=
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
