Hi Sven (and Stephen and everyone else),

I know there are certainly things you know but can’t talk about, but I suspect 
that I am not the only one to wonder about the possible significance of “with 
the released code” in your response below?!?

I understand the technical point you’re making and maybe the solution for me is 
to just use a 4 MB block size for my metadata only system pool?  As Stephen 
Ulmer said in his response … ("Why the desire for a 1MB block size for 
metadata? It is RAID1 so no re-write penalty or need to hit a stripe size. Are 
you just trying to save the memory?  If you had a 4MB block size, an 8KB 
sub-block size and things were 4K-aligned, you would always read 2 4K inodes,”) 
… so if I’m using RAID 1 with 4K inodes then am I gaining anything by going 
with a smaller block size for metadata?

So why was I choosing 1 MB in the first place?  Well, I was planning on doing 
some experimenting with different block sizes for metadata to see if it made 
any difference.  Historically, we had used a metadata block size of 64K to 
match the hardware “stripe” size on the storage arrays (RAID 1 mirrors of hard 
drives back in the day).  Now our metadata is on SSDs so with our latest 
filesystem we used 1 MB for both data and metadata because of the 1/32nd 
sub-block thing in GPFS 4.x.  Since GPFS 5 removes that restriction, I was 
going to do some experimenting, but if the correct answer is just “if 4 MB is 
what’s best for your data, then use it for metadata too” then I don’t mind 
saving some time…. ;-)

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
[email protected]<mailto:[email protected]> - 
(615)875-9633

On Aug 1, 2018, at 4:01 PM, Sven Oehme 
<[email protected]<mailto:[email protected]>> wrote:

the only way to get max number of subblocks for a 5.0.x filesystem with the 
released code is to have metadata and data use the same blocksize.

sven

On Wed, Aug 1, 2018 at 11:52 AM Buterbaugh, Kevin L 
<[email protected]<mailto:[email protected]>> wrote:
All,

Sorry for the 2nd e-mail but I realize that 4 MB is 4 times 1 MB … so does this 
go back to what Marc is saying that there’s really only one sub blocks per 
block parameter?  If so, is there any way to get what I want as described below?

Thanks…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
[email protected]<mailto:[email protected]> - 
(615)875-9633<tel:(615)%20875-9633>


On Aug 1, 2018, at 1:47 PM, Buterbaugh, Kevin L 
<[email protected]<mailto:[email protected]>> wrote:

Hi Sven,

OK … but why?  I mean, that’s not what the man page says.  Where does that “4 
x” come from?

And, most importantly … that’s not what I want.  I want a smaller block size 
for the system pool since it’s metadata only and on RAID 1 mirrors (HD’s on the 
test cluster but SSD’s on the production cluster).  So … side question … is 1 
MB OK there?

But I want a 4 MB block size for data with an 8 KB sub block … I want good 
performance for the sane people using our cluster without unduly punishing the 
… ahem … fine folks whose apps want to create a bazillion tiny files!

So how do I do that?

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
[email protected]<mailto:[email protected]> - 
(615)875-9633<tel:(615)%20875-9633>


On Aug 1, 2018, at 1:41 PM, Sven Oehme 
<[email protected]<mailto:[email protected]>> wrote:

the number of subblocks is derived by the smallest blocksize in any pool of a 
given filesystem. so if you pick a metadata blocksize of 1M it will be 8k in 
the metadata pool, but 4 x of that in the data pool if your data pool is 4M.

sven
On Wed, Aug 1, 2018 at 11:21 AM Felipe Knop 
<[email protected]<mailto:[email protected]>> wrote:

Marc, Kevin,

We'll be looking into this issue, since at least at a first glance, it does 
look odd. A 4MB block size should have resulted in an 8KB subblock size. I 
suspect that, somehow, the --metadata-block-size 1M may have resulted in

32768 Minimum fragment (subblock) size in bytes (other pools)


but I do not yet understand how.

The subblocks-per-full-block parameter is not supported with mmcrfs .

Felipe

----
Felipe Knop [email protected]<mailto:[email protected]>
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314<tel:(845)%20433-9314> T/L 293-9314




<graycol.gif>"Marc A Kaplan" ---08/01/2018 01:21:23 PM---I haven't looked into 
all the details but here's a clue -- notice there is only one "subblocks-per-

From: "Marc A Kaplan" <[email protected]<mailto:[email protected]>>


To: gpfsug main discussion list 
<[email protected]<mailto:[email protected]>>

Date: 08/01/2018 01:21 PM
Subject: Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Sent by: 
[email protected]<mailto:[email protected]>

________________________________



I haven't looked into all the details but here's a clue -- notice there is only 
one "subblocks-per-full-block" parameter.

And it is the same for both metadata blocks and datadata blocks.

So maybe (MAYBE) that is a constraint somewhere...

Certainly, in the currently supported code, that's what you get.




From: "Buterbaugh, Kevin L" 
<[email protected]<mailto:[email protected]>>
To: gpfsug main discussion list 
<[email protected]<mailto:[email protected]>>
Date: 08/01/2018 12:55 PM
Subject: [gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?
Sent by: 
[email protected]<mailto:[email protected]>
________________________________



Hi All,

Our production cluster is still on GPFS 4.2.3.x, but in preparation for moving 
to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 5.0.1-1. I am 
setting up a new filesystem there using hardware that we recently life-cycled 
out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong. I’m using a 4 MB filesystem block size, so according to the mmcrfs man 
page the sub-block size should be 8K:

Table 1. Block sizes and subblock sizes

+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| Block size | Subblock size |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| 64 KiB | 2 KiB |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| 128 KiB | 4 KiB |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2 | 8 KiB |
| MiB, 4 MiB | |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| 8 MiB, 16 MiB | 16 KiB |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flag value description
------------------- ------------------------ -----------------------------------
-f 8192 Minimum fragment (subblock) size in bytes (system pool)
32768 Minimum fragment (subblock) size in bytes (other pools)
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 2 Default number of metadata replicas
-M 3 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 3 Maximum number of data replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 1048576 Block size (system pool)
4194304 Block size (other pools)
-Q user;group;fileset Quotas accounting enabled
user;group;fileset Quotas enforced
none Default quotas enabled
--perfileset-quota No Per-fileset quota enforcement
--filesetdf No Fileset df enabled?
-V 19.01 (5.0.1.0) File system version
--create-time Wed Aug 1 11:39:39 2018 File system creation time
-z No Is DMAPI enabled?
-L 33554432 Logfile size
-E Yes Exact mtime mount option
-S relatime Suppress atime mount option
-K whenpossible Strict replica allocation option
--fastea Yes Fast external attributes enabled?
--encryption No Encryption enabled?
--inode-limit 101095424 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
--subblocks-per-full-block 128 Number of subblocks per full block
-P system;raid1;raid6 Disk storage pools in file system
--file-audit-log No File Audit Logging enabled?
--maintenance-mode No Maintenance Mode enabled?
-d 
test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd
 Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /gpfs5 Default mount point
--mount-priority 0 Mount priority

Output of mmcrfs:

mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k 
all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes 
--nofilesetdf --metadata-block-size 1M

The following disks of gpfs5 will be formatted on node testnsd3:
test21A3nsd: size 953609 MB
test21A4nsd: size 953609 MB
test21B3nsd: size 953609 MB
test21B4nsd: size 953609 MB
test23Ansd: size 15259744 MB
test23Bnsd: size 15259744 MB
test23Cnsd: size 1907468 MB
test24Ansd: size 15259744 MB
test24Bnsd: size 15259744 MB
test24Cnsd: size 1907468 MB
test25Ansd: size 15259744 MB
test25Bnsd: size 15259744 MB
test25Cnsd: size 1907468 MB
Formatting file system ...
Disks up to size 8.29 TB can be added to storage pool system.
Disks up to size 16.60 TB can be added to storage pool raid1.
Disks up to size 132.62 TB can be added to storage pool raid6.
Creating Inode File
8 % complete on Wed Aug 1 11:39:19 2018
18 % complete on Wed Aug 1 11:39:24 2018
27 % complete on Wed Aug 1 11:39:29 2018
37 % complete on Wed Aug 1 11:39:34 2018
48 % complete on Wed Aug 1 11:39:39 2018
60 % complete on Wed Aug 1 11:39:44 2018
72 % complete on Wed Aug 1 11:39:49 2018
83 % complete on Wed Aug 1 11:39:54 2018
95 % complete on Wed Aug 1 11:39:59 2018
100 % complete on Wed Aug 1 11:40:01 2018
Creating Allocation Maps
Creating Log Files
3 % complete on Wed Aug 1 11:40:07 2018
28 % complete on Wed Aug 1 11:40:14 2018
53 % complete on Wed Aug 1 11:40:19 2018
78 % complete on Wed Aug 1 11:40:24 2018
100 % complete on Wed Aug 1 11:40:25 2018
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
85 % complete on Wed Aug 1 11:40:32 2018
100 % complete on Wed Aug 1 11:40:33 2018
Formatting Allocation Map for storage pool raid1
53 % complete on Wed Aug 1 11:40:38 2018
100 % complete on Wed Aug 1 11:40:42 2018
Formatting Allocation Map for storage pool raid6
20 % complete on Wed Aug 1 11:40:47 2018
39 % complete on Wed Aug 1 11:40:52 2018
60 % complete on Wed Aug 1 11:40:57 2018
79 % complete on Wed Aug 1 11:41:02 2018
100 % complete on Wed Aug 1 11:41:08 2018
Completed creation of file system /dev/gpfs5.
mmcrfs: Propagating the cluster configuration data to all
affected nodes. This is an asynchronous process.

And contents of stanza file:

%nsd:
nsd=test21A3nsd
usage=metadataOnly
failureGroup=210
pool=system
servers=testnsd3,testnsd1,testnsd2
device=dm-15

%nsd:
nsd=test21A4nsd
usage=metadataOnly
failureGroup=210
pool=system
servers=testnsd1,testnsd2,testnsd3
device=dm-14

%nsd:
nsd=test21B3nsd
usage=metadataOnly
failureGroup=211
pool=system
servers=testnsd1,testnsd2,testnsd3
device=dm-17

%nsd:
nsd=test21B4nsd
usage=metadataOnly
failureGroup=211
pool=system
servers=testnsd2,testnsd3,testnsd1
device=dm-16

%nsd:
nsd=test23Ansd
usage=dataOnly
failureGroup=23
pool=raid6
servers=testnsd2,testnsd3,testnsd1
device=dm-10

%nsd:
nsd=test23Bnsd
usage=dataOnly
failureGroup=23
pool=raid6
servers=testnsd3,testnsd1,testnsd2
device=dm-9

%nsd:
nsd=test23Cnsd
usage=dataOnly
failureGroup=23
pool=raid1
servers=testnsd1,testnsd2,testnsd3
device=dm-5

%nsd:
nsd=test24Ansd
usage=dataOnly
failureGroup=24
pool=raid6
servers=testnsd3,testnsd1,testnsd2
device=dm-6

%nsd:
nsd=test24Bnsd
usage=dataOnly
failureGroup=24
pool=raid6
servers=testnsd1,testnsd2,testnsd3
device=dm-0

%nsd:
nsd=test24Cnsd
usage=dataOnly
failureGroup=24
pool=raid1
servers=testnsd2,testnsd3,testnsd1
device=dm-2

%nsd:
nsd=test25Ansd
usage=dataOnly
failureGroup=25
pool=raid6
servers=testnsd1,testnsd2,testnsd3
device=dm-6

%nsd:
nsd=test25Bnsd
usage=dataOnly
failureGroup=25
pool=raid6
servers=testnsd2,testnsd3,testnsd1
device=dm-6

%nsd:
nsd=test25Cnsd
usage=dataOnly
failureGroup=25
pool=raid1
servers=testnsd3,testnsd1,testnsd2
device=dm-3

%pool:
pool=system
blockSize=1M
usage=metadataOnly
layoutMap=scatter
allowWriteAffinity=no

%pool:
pool=raid6
blockSize=4M
usage=dataOnly
layoutMap=scatter
allowWriteAffinity=no

%pool:
pool=raid1
blockSize=4M
usage=dataOnly
layoutMap=scatter
allowWriteAffinity=no

What am I missing or what have I done wrong? Thanks…

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
[email protected]<mailto:[email protected]>- 
(615)875-9633<tel:(615)%20875-9633>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834171332&sdata=sFB5TXhhOddzDjupY8G04%2FUb%2BWKO6UDsaS0lWcBsAVE%3D&reserved=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834181344&sdata=iyZVZSpq2Z3e6xzMKa2nACI8GATEqkGOaqrZyuvZMjc%3D&reserved=0>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834191353&sdata=AGpbm%2BxjIycToPKKP9Amtzzl6jAn59e3d3kr9R7Setc%3D&reserved=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834191353&sdata=2csVvV7tvgg8fMM01RLj5fY8uhvIK44k4hRsD9vjuV0%3D&reserved=0>




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834201361&sdata=hsZ8eOtS9sQhGAh76vk3UY3KTpol0VCfAVaD6Kw9m00%3D&reserved=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834211369&sdata=enjtshAXuqo0g6fqmUJOnCKL88MujJuDUWTXdauvx2A%3D&reserved=0>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org%2F&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C23d636037b234fbbf9e908d5f7f1fcd1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687541066564165&sdata=6S9%2FNtu4kRKGlTvYOTQnf3ECbfpg8v6diOhsX1MzxJs%3D&reserved=0>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C8a00ac1e037d45913c8708d5f7de60ac%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687456834221377&amp;sdata=MuPoxpCweqPxLR%2FAaWIgP%2BIkh0bUEVeG3cCzwoZoyE0%3D&amp;reserved=0


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C23d636037b234fbbf9e908d5f7f1fcd1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687541066564165&sdata=DdV3CJ1sIrVyc30Co0t0UdxyhB8at9f07BX8DoyDDpM%3D&reserved=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C23d636037b234fbbf9e908d5f7f1fcd1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687541066564165&sdata=Z1tfD%2BMI1piJAtaBXQ2y9MEGNNLqCyKgHHws2wHmiTo%3D&reserved=0>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C23d636037b234fbbf9e908d5f7f1fcd1%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687541066564165&amp;sdata=Z1tfD%2BMI1piJAtaBXQ2y9MEGNNLqCyKgHHws2wHmiTo%3D&amp;reserved=0

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to