Thanks for the great and informative answer! On Mon, Feb 5, 2018 at 5:19 AM, Dilger, Andreas <[email protected]> wrote:
> On Feb 4, 2018, at 13:10, E.S. Rosenberg <[email protected]> > wrote: > > On Sat, Feb 3, 2018 at 4:45 AM, Dilger, Andreas < > [email protected]> wrote: > >> On Jan 26, 2018, at 07:56, Thomas Roth <[email protected]> wrote: > >> > > >> > Hmm, option-testing leads to more confusion: > >> > > >> > With this 922GB-sdb1 I do > >> > > >> > mkfs.lustre --reformat --mgs --mdt ... /dev/sdb1 > >> > > >> > The output of the command says > >> > > >> > Permanent disk data: > >> > Target: test0:MDT0000 > >> > ... > >> > > >> > device size = 944137MB > >> > formatting backing filesystem ldiskfs on /dev/sdb1 > >> > target name test0:MDT0000 > >> > 4k blocks 241699072 > >> > options -J size=4096 -I 1024 -i 2560 -q -O > dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E > lazy_journal_init -F > >> > > >> > mkfs_cmd = mke2fs -j -b 4096 -L test0:MDT0000 -J size=4096 -I 1024 > -i 2560 -q -O dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg > -E lazy_journal_init -F /dev/sdb1 241699072 > >> > >> The default options have to be conservative, as we don't know in > advance how a filesystem will be used. It may be that some sites will have > lots of hard links or long filenames (which consume directory space == > blocks, but not inodes), or they will have widely-striped files (which also > consume xattr blocks). The 2KB/inode ratio includes the space for the > inode itself (512B in 2.7.x 1024B in 2.10), at least one directory entry > (~64 bytes), some fixed overhead for the journal (up to 4GB on the MDT), > and Lustre-internal overhead (OI entry = ~64 bytes), ChangeLog, etc. > >> > >> If you have a better idea of space usage at your site, you can specify > different parameters. > >> > >> > Mount this as ldiskfs, gives 369 M inodes. > >> > > >> > One would assume that specifying one / some of the mke2fs-options > here in the mkfs.lustre-command will change nothing. > >> > > >> > However, > >> > > >> > mkfs.lustre --reformat --mgs --mdt ... --mkfsoptions="-I 1024" > /dev/sdb1 > >> > > >> > says > >> > > >> > device size = 944137MB > >> > formatting backing filesystem ldiskfs on /dev/sdb1 > >> > target name test0:MDT0000 > >> > 4k blocks 241699072 > >> > options -I 1024 -J size=4096 -i 1536 -q -O > dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg -E > lazy_journal_init -F > >> > > >> > mkfs_cmd = mke2fs -j -b 4096 -L test0:MDT0000 -I 1024 -J size=4096 -i > 1536 -q -O dirdata,uninit_bg,^extents,mmp,dir_nlink,quota,huge_file,flex_bg > -E lazy_journal_init -F /dev/sdb1 241699072 > >> > > >> > and the mounted devices now has 615 M inodes. > >> > > >> > So, whatever makes the calculation for the "-i / bytes-per-inode" > value becomes ineffective if I specify the inode size by hand? > >> > >> This is a bit surprising. I agree that specifying the same inode size > value as the default should not affect the calculation for the > bytes-per-inode ratio. > >> > >> > How many bytes-per-inode do I need? > >> > > >> > This ratio, is it what the manual specifies as "one inode created for > each 2kB of LUN" ? > >> > >> That was true with 512B inodes, but with the increase to 1024B inodes > in 2.10 (to allow for PFL file layouts, since they are larger) the inode > ratio has also gone up 512B to 2560B/inode. > > > > Does this mean that someone who updates their servers from 2.x to 2.10 > will not be able to use PFL since the MDT was formatted in a way that can't > support it? (in our case formatted under Lustre 2.5 currently running 2.8) > > It will be possible to use PFL layouts with older MDTs, but there may be a > performance impact if the MDTs are HDD based because a multi-component PFL > layout is unlikely to fit into the 512-byte inode, so they will allocate an > extra xattr block for each PFL file. For SSD-based MDTs the extra seek is > not likely to impact performance significantly, but for HDD-based MDTs this > extra seek for accessing every file will reduce the metadata performance. > > If you formatted the MDT filesystem for a larger default stripe count (e.g. > use "mkfs.lustre ... --stripe-count-hint=8" or more) then you will already > have 1024-byte inodes, and this is a non-issue. > > That said, the overall impact to your applications may be minimal if you do > not have metadata-intensive workloads, and PFL can help improve the IO > performance of applications because many users do not set proper striping > on > their files, so the IO performance of files can improve. > > Of course, if you know in advance what the best striping for a file is, and > your applications or users already use that, then PFL is not necessary and > there is no performance impact if PFL is not used. > > Cheers, Andreas > > >> > Perhaps the raw size of an MDT device should better be such that it > leads > >> > to "-I 1024 -i 2048"? > >> > >> Yes, that is probably reasonable, since the larger inode also means > that there is less chance of external xattr blocks being allocated. > >> > >> Note that with ZFS there is no need to specify the inode ratio at all. > It will dynamically allocate inode blocks as needed, along with directory > blocks, OI tables, etc., until the filesystem is full. > >> > >> Cheers, Andreas > >> > >> > On 01/26/2018 03:10 PM, Thomas Roth wrote: > >> >> Hi all, > >> >> what is the relation between raw device size and size of a formatted > MDT? Size of inodes + free space = raw size? > >> >> The example: > >> >> MDT device has 922 GB in /proc/partions. > >> >> Formatted under Lustre 2.5.3 with default values for mkfs.lustre > resulted in a 'df -h' MDT of 692G and more importantly 462M inodes. > >> >> So, the space used for inodes + the 'df -h' output add up to the raw > size: > >> >> 462M inodes * 0.5kB/inode + 692 GB = 922 GB > >> >> On that system there are now 330M files, more than 70% of the > available inodes. > >> >> 'df -h' says '692G 191G 456G 30% /srv/mds0' > >> >> What do I need the remaining 450G for? (Or the ~400G left once all > the inodes are eaten?) > >> >> Should the format command not be tuned towards more inodes? > >> >> Btw, on a Lustre 2.10.2 MDT I get 369M inodes and 550 G space (with > a 922G raw device): inode size is now 1024. > >> >> However, according to the manual and various Jira/Ludocs the size > should be 2k nowadays? > >> >> Actually, the command within mkfs.lustre reads > >> >> mke2fs -j -b 4096 -L test0:MDT0000 -J size=4096 -I 1024 -i 2560 -F > /dev/sdb 241699072 > >> >> -i 2560 ? > >> >> Cheers, > >> >> Thomas > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Intel Corporation > > > > > > > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
