Hi, Carl, Sven had mentioned the RMW penalty before which could make it beneficial to use smaller blocks. If you have traditional RAIDs and you go the usual route to do track sizes equal to the block size (stripe size = BS/n with n+p RAIDs), you may run into problems if your I/O are typically or very often smaller than a block because the controller needs to read the entire track, modifies it according to your I/O, and writes it back with the parity stripes. Example: with 4MiB BS and 8+2 RAIDS as NSDs, on each I/O smaller than 4MiB reaching an NSD the controller needs to read 4MiB into a buffer, modify it according to your I/O, calculate parity for the whole track and write back 5MiB (8 data stripes of 512kiB plus two parity stripes). In those cases you might be better off with smaller block sizes. In the above scenario, it might however still be ok to leave the block size at 4MiB and just reduce the track size of the RAIDs. One has to check how that affects performance, YMMV I'd say here.
Mind that the ESS uses a clever way to mask these type of I/O from the n+p RS based vdisks, but even there one might need to think ... Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: [email protected] ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Thomas Wolter, Sven Schooß Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Carl <[email protected]> To: gpfsug main discussion list <[email protected]> Date: 02/07/2018 11:57 Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: [email protected] Thanks Olaf and Sven, It looks like a lot of advice from the wiki ( https://www.ibm.com/developerworks/community/wikis/home?lang=en-us#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Data%20and%20Metadata ) is no longer relevant for version 5. Any idea if its likely to be updated soon? The new subblock changes appear to have removed a lot of reasons for using smaller block sizes. In broad terms there any situations where you would recommend using less than the new default block size? Cheers, Carl. On Mon, 2 Jul 2018 at 17:55, Sven Oehme <[email protected]> wrote: Olaf, he is talking about indirect size not subblock size . Carl, here is a screen shot of a 4mb filesystem : [root@p8n15hyp ~]# mmlsfs all_local File system attributes for /dev/fs2-4m-07: ========================================== flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 1 Default number of metadata replicas -M 2 Maximum number of metadata replicas -r 1 Default number of data replicas -R 2 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 512 Estimated number of nodes that will mount file system -B 4194304 Block size -Q none Quotas accounting enabled none Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Mon Jun 18 12:30:54 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 4000000000 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 512 Number of subblocks per full block -P system Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d RG001VS001;RG002VS001;RG003VS002;RG004VS002 Disks in file system -A no Automatic mount option -o none Additional mount options -T /gpfs/fs2-4m-07 Default mount point --mount-priority 0 Mount priority as you can see indirect size is 32k sven On Mon, Jul 2, 2018 at 9:46 AM Olaf Weiser <[email protected]> wrote: HI Carl, 8k for 4 M Blocksize files < ~3,x KB fits into the inode , for "larger" files (> 3,x KB) at least one "subblock" be allocated .. in R < 5.x ... it was fixed 1/32 from blocksize so subblocksize is retrieved from the blocksize ... since R >5 (so new created file systems) .. the new default block size is 4 MB, fragment size is 8k (512 subblocks) for even larger block sizes ... more subblocks are available per block so e.g. 8M .... 1024 subblocks (fragment size is 8 k again) @Sven.. correct me, if I'm wrong ... From: Carl <[email protected]> To: gpfsug main discussion list <[email protected]> Date: 07/02/2018 08:55 AM Subject: Re: [gpfsug-discuss] subblock sanity check in 5.0 Sent by: [email protected] Hi Sven, What is the resulting indirect-block size with a 4mb metadata block size? Does the new sub-block magic mean that it will take up 32k, or will it occupy 128k? Cheers, Carl. On Mon, 2 Jul 2018 at 15:26, Sven Oehme <[email protected]> wrote: Hi, most traditional raid controllers can't deal well with blocksizes above 4m, which is why the new default is 4m and i would leave it at that unless you know for sure you get better performance with 8mb which typically requires your raid controller volume full block size to be 8mb with maybe a 8+2p @1mb strip size (many people confuse strip size with full track size) . if you don't have dedicated SSDs for metadata i would recommend to just use a 4mb blocksize with mixed data and metadata disks, if you have a reasonable number of SSD's put them in a raid 1 or raid 10 and use them as dedicated metadata and the other disks as dataonly , but i would not use the --metadata-block-size parameter as it prevents the datapool to use large number of subblocks. as long as your SSDs are on raid 1 or 10 there is no read/modify/write penalty, so using them with the 4mb blocksize has no real negative impact at least on controllers i have worked with. hope this helps. On Tue, Jun 26, 2018 at 5:18 PM Joseph Mendoza <[email protected]> wrote: Hi, it's for a traditional NSD setup. --Joey On 6/26/18 12:21 AM, Sven Oehme wrote: Joseph, the subblocksize will be derived from the smallest blocksize in the filesytem, given you specified a metadata block size of 512k thats what will be used to calculate the number of subblocks, even your data pool is 4mb. is this setup for a traditional NSD Setup or for GNR as the recommendations would be different. sven On Tue, Jun 26, 2018 at 2:59 AM Joseph Mendoza <[email protected]> wrote: Quick question, anyone know why GPFS wouldn't respect the default for the subblocks-per-full-block parameter when creating a new filesystem? I'd expect it to be set to 512 for an 8MB block size but my guess is that also specifying a metadata-block-size is interfering with it (by being too small). This was a parameter recommended by the vendor for a 4.2 installation with metadata on dedicated SSDs in the system pool, any best practices for 5.0? I'm guessing I'd have to bump it up to at least 4MB to get 512 subblocks for both pools. fs1 created with: # mmcrfs fs1 -F fs1_ALL -A no -B 8M -i 4096 -m 2 -M 2 -r 1 -R 2 -j cluster -n 9000 --metadata-block-size 512K --perfileset-quota --filesetdf -S relatime -Q yes --inode-limit 20000000:10000000 -T /gpfs/fs1 # mmlsfs fs1 <snipped> flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 131072 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -B 524288 Block size (system pool) 8388608 Block size (other pools) -V 19.01 (5.0.1.0) File system version --subblocks-per-full-block 64 Number of subblocks per full block -P system;DATA Disk storage pools in file system Thanks! --Joey Mendoza NCAR _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
