As Simon already mentioned, set the similar quotas at both cache and home clusters to avoid the queue stuck problem due to quotas being exceeds home.
>At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet >quotas. AFM will support dependent filesets from 5.0.4. Dependent filesets can be created at the cache in the independent fileset and set the same quotas from the home >We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. AFM uses some inode space to store the remote file attributes like file handle, file times etc.. as part of the EAs. If the file does not have hard links, maximum inode space used by the AFM is around 200 bytes. AFM cache can store the file's data in the inode if it have 200 bytes of more free space in the inode, otherwise file's data will be stored in subblock rather than using the full block. For example if the inode size is 4K at both cache and home, if the home file size is 3k and inode is using 300 bytes to store the file metadata, then free space in the inode at the home will be 724 bytes(4096 - (3072 + 300)). When this file is cached by the AFM , AFM adds internal EAs for 200 bytes, then the free space in the inode at the cache will be 524 bytes(4096 - (3072 + 300 + 200)). If the filesize is 3600 bytes at the home, AFM cannot store the data in the inode at the cache. So AFM stores the file data in the block only if it does not have enough space to store the internal EAs. ~Venkat ([email protected]) From: Simon Thompson <[email protected]> To: gpfsug main discussion list <[email protected]> Date: 10/12/2019 01:52 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] Quotas and AFM Sent by: [email protected] Oh and I forgot. This only works if you precache th data from home. Otherwise the disk usage at cache is only what you cached, as you don't know what size it is from home. Unless something has changed recently at any rate. Simon From: [email protected] <[email protected]> on behalf of Simon Thompson <[email protected]> Sent: Friday, October 11, 2019 9:10:20 PM To: gpfsug main discussion list <[email protected]> Subject: Re: [gpfsug-discuss] Quotas and AFM Yes just set the quotas the same on both. Or a default quota and have exceptions if that works in your case. But this was where I think the inode in file is an issue if you have a lot of small files as in the inode at home they don't consume quota I think but as they are in a data block at cache they do. So it might now be quite so straightforward. And yes writes at home just get out of space, it's the AFM cache that fails on the write back to home but then its in the queue and can block it. Simon From: [email protected] <[email protected]> on behalf of Ryan Novosielski <[email protected]> Sent: Friday, October 11, 2019 9:05:15 PM To: gpfsug main discussion list <[email protected]> Subject: Re: [gpfsug-discuss] Quotas and AFM Do you know is there anything that prevents me from just setting the quotas the same on the IW cache, if there’s no way to inherit? For the case of the home directories, it’s simple, as they are all 100G with some exceptions, so a default user quota takes care of almost all of it. Luckily, that’s right now where our problem is, but we have the potential with other filesets later. I’m also wondering if you can confirm that I should /not/ need to be looking at people who are writing to the at home fileset, where the quotas are set, as a problem syncing TO the cache, e.g. they don’t add to the queue. I assume GPFS sees the over quota and just denies the write, yes? I originally thought the problem was in that direction and was totally perplexed about how it could be so stupid. 😅 -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - [email protected] || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Oct 11, 2019, at 15:56, Simon Thompson <[email protected]> wrote: Yes. When we ran AFM, we had exactly this issue. What would happen is that a user/fileset quota would be hit and a compute job would continue writing. This would eventually fill the AFM queue. If you were lucky you could stop and restart the queue and it would process other files from other users but inevitably we'd get back to the same state. The solution was to increase the quota at home to clear the queue, kill user workload and then reduce their quota again. At home we had replication of two so it wasn't straight forward to set the same quotas on cache, we could just about fudge it for user home directories but not for most of our project storage as we use dependent fileaet quotas. We also saw issues with data in inode at home as this doesn't work at AFM cache so it goes into a block. I've forgotten the exact issues around that now. So our experience was much like you describe. Simon From: <[email protected]> on behalf of Ryan Novosielski <[email protected]> Sent: Friday, 11 October 2019, 18:43 To: gpfsug main discussion list Subject: [gpfsug-discuss] Quotas and AFM Does anyone have any good resources or experience with quotas and AFM caches? Our scenario is that we have an AFM home one one site, an AFM cache on another site, and then a client cluster on that remote site that mounts the cache. The AFM filesets are IW. One of them contains our home directories, which have a quota set on the home side. Quotas were disabled entirely on the cache side (I enabled them recently, but did not set them to anything). What I believe we’re running into is scary long AFM queues that are caused by people writing an amount that is over the home quota to the cache, but the cache is accepting it and then failing to sync back to the home because the user is at their hard limit. I believe we’re also seeing delays on unaffected users who are not over their quota, but that’s harder to tell. We have the AFM gateways poorly/not tuned, so that is likely interacting. Is there any way to make the quotas apparent to the cache cluster too, beyond setting a quota there as well, or do I just fundamentally misunderstand this in some other way? We really just want the quotas on the home cluster to be enforced everywhere, more or less. Thanks! -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - [email protected] || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=FQMV8_Ivetm1R6_TcCWroPT58pjhPJgL39pgOdQEiqw&s=DfvksQLrKgv0OpK3Dr5pR-FUkhNddIvieh9_8h1jyGQ&e=
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
