On Jan 26, 2018, at 07:56, Thomas Roth <t.r...@gsi.de> wrote:
> 
> Hmm, option-testing leads to more confusion:
> 
> With this 922GB-sdb1 I do
> 
> mkfs.lustre --reformat --mgs --mdt ... /dev/sdb1
> 
> The output of the command says
> 
>   Permanent disk data:
> Target:     test0:MDT0000
> ...
> 
> device size = 944137MB
> formatting backing filesystem ldiskfs on /dev/sdb1
>       target name   test0:MDT0000
>       4k blocks     241699072
>       options        -J size=4096 -I 1024 -i 2560 -q -O 
> dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E 
> lazy_journal_init -F
> 
> mkfs_cmd = mke2fs -j -b 4096 -L test0:MDT0000  -J size=4096 -I 1024 -i 2560 
> -q -O dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E 
> lazy_journal_init -F /dev/sdb1 241699072

The default options have to be conservative, as we don't know in advance how a 
filesystem will be used.  It may be that some sites will have lots of hard 
links or long filenames (which consume directory space == blocks, but not 
inodes), or they will have widely-striped files (which also consume xattr 
blocks).  The 2KB/inode ratio includes the space for the inode itself (512B in 
2.7.x 1024B in 2.10), at least one directory entry (~64 bytes), some fixed 
overhead for the journal (up to 4GB on the MDT), and Lustre-internal overhead 
(OI entry = ~64 bytes), ChangeLog, etc.

If you have a better idea of space usage at your site, you can specify 
different parameters.

> Mount this as ldiskfs, gives 369 M inodes.
> 
> One would assume that specifying one / some of the mke2fs-options here in the 
> mkfs.lustre-command will change nothing.
> 
> However,
> 
> mkfs.lustre --reformat --mgs --mdt ... --mkfsoptions="-I 1024" /dev/sdb1
> 
> says
> 
> device size = 944137MB
> formatting backing filesystem ldiskfs on /dev/sdb1
>       target name   test0:MDT0000
>       4k blocks     241699072
>       options       -I 1024 -J size=4096 -i 1536 -q -O 
> dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E 
> lazy_journal_init -F
> 
> mkfs_cmd = mke2fs -j -b 4096 -L test0:MDT0000 -I 1024 -J size=4096 -i 1536 -q 
> -O dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E 
> lazy_journal_init -F /dev/sdb1 241699072
> 
> and the mounted devices now has 615 M inodes.
> 
> So, whatever makes the calculation for the "-i / bytes-per-inode" value 
> becomes ineffective if I specify the inode size by hand?

This is a bit surprising.  I agree that specifying the same inode size value as 
the default should not affect the calculation for the bytes-per-inode ratio.

> How many bytes-per-inode do I need?
> 
> This ratio, is it what the manual specifies as "one inode created for each 
> 2kB of LUN" ?

That was true with 512B inodes, but with the increase to 1024B inodes in 2.10 
(to allow for PFL file layouts, since they are larger) the inode ratio has also 
gone up 512B to 2560B/inode.

> Perhaps the raw size of an MDT device should better be such that it leads to 
> "-I 1024 -i 2048"?

Yes, that is probably reasonable, since the larger inode also means that there 
is less chance of external xattr blocks being allocated.

Note that with ZFS there is no need to specify the inode ratio at all.  It will 
dynamically allocate inode blocks as needed, along with directory blocks, OI 
tables, etc., until the filesystem is full.

Cheers, Andreas

> On 01/26/2018 03:10 PM, Thomas Roth wrote:
>> Hi all,
>> what is the relation between raw device size and size of a formatted MDT? 
>> Size of inodes + free space = raw size?
>> The example:
>> MDT device has 922 GB in /proc/partions.
>> Formatted under Lustre 2.5.3 with default values for mkfs.lustre resulted in 
>> a 'df -h' MDT of 692G and more importantly 462M inodes.
>> So, the space used for inodes + the 'df -h' output add up to the raw size:
>>  462M inodes * 0.5kB/inode + 692 GB = 922 GB
>> On that system there are now 330M files, more than 70% of the available 
>> inodes.
>> 'df -h' says '692G  191G  456G  30% /srv/mds0'
>> What do I need the remaining 450G for? (Or the ~400G left once all the 
>> inodes are eaten?)
>> Should the format command not be tuned towards more inodes?
>> Btw, on a Lustre 2.10.2 MDT I get 369M inodes and 550 G space (with a 922G 
>> raw device): inode size is now 1024.
>> However, according to the manual and various Jira/Ludocs the size should be 
>> 2k nowadays?
>> Actually, the command within mkfs.lustre reads
>> mke2fs -j -b 4096 -L test0:MDT0000  -J size=4096 -I 1024 -i 2560  -F 
>> /dev/sdb 241699072
>> -i 2560 ?
>> Cheers,
>> Thomas
> 
> -
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Intel Corporation







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to