The DNE auto-split functionality is disabled by default and not fully completed
(e.g. preserve inode numbers) because it had issues with significant
performance impact/latency while splitting a directory that was currently in
use (which is exactly when you would want to use it), so I wouldn't recommend
to use it at this time.
Instead, development efforts were focussed on DNE MDT space balancing. This
adds two different features that allow all of the MDTs in a filesystem to be
used without user/admin intervention (though it is still possible to manually
create directories on specific MDTs as before).
The "round-robin" MDT selection ("lfs setdirstripe -D --max-depth-rr=N -c 1 -i
-1") for top-level directories (enabled for the top 3 levels of the filesystem
by default) will, as the name suggests, round robin new directories across all
of the available MDTs, when their space is evenly balanced (within 5% free
space*inodes by default). That is important to distribute *new* directories
across MDTs in new filesystems when e.g. .../home/$user or .../project/$project
or .../scratch/$user are being created.
The "space balance" MDT selection ("lctl set_param lmv.*.qos_threshold_rr=N" on
the *CLIENT*) kicks in when MDT space usage becomes imbalanced (free
space*inodes difference above 5% by default), and then starts selecting the MDT
for *new* directories based on the ratio of free space*inodes. That allows the
MDTs to return toward balance over time, without causing a performance
imbalance when it isn't necessary.
Note that both of these heuristics operate on *single-stripe directories* and
not regular files, so the MDT balance will not be perfect if some directory
tree has millions more files/subdirectories than another. However, the main
issue being avoided is the *very* common case of MDT0000 getting full and
MDT0001..N being (almost) totally unused. These features also make the MDT
*usage* balance also pretty good as a result, so it is a win-win. For most
filesystems, the MDT capacity is not the limiting factor (it only makes up a
few percent of the total storage).
Cheers, Andreas
On Mar 23, 2023, at 15:31, Bertschinger, Thomas Andrew Hjorth via
lustre-discuss
<[email protected]<mailto:[email protected]>> wrote:
Hello,
We've been experimenting with DNEv3 recently and have run into this issue:
https://jira.whamcloud.com/browse/LU-7607 where the directory inode number
changes after auto-split.
In addition to the problem noted with backups that track the inode number, we
have found that file access through a previously open file descriptor is broken
post migration. This can occur when a shell's CWD is the affected directory.
For example:
mds0 # lctl get_param
mdt.mylustre-MDT0000.{dir_split_count,enable_dir_auto_split}
mdt.mylustre-MDT0000.dir_split_count=100
mdt.mylustre-MDT0000.enable_dir_auto_split=1
client $ pwd
/mnt/mylustre/dnetest
client $ for i in {0..100}; do touch file$i; done
client $ ls
ls: cannot open directory '.': Operation not permitted
client $ ls file0
ls: cannot access 'file0': No such file or directory
client $ ls /mnt/mylustre/dnetest/file0
/mnt/mylustre/dnetest/file0
(This is from a build of the current master branch.)
We believe users will certainly encounter this, because users monitor output
directories of jobs as they run. Therefore this issue is a dealbreaker with
DNEv3 for us.
I wanted to ask about the status of the linked issue, since it looks like it
hasn't been updated in a while. Would the resolution to LU-7607 be expected to
fix the file access problem I've noted here or will this require additional
changes to resolve?
Thanks!
- Thomas Bertschinger
_______________________________________________
lustre-discuss mailing list
[email protected]<mailto:[email protected]>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org