The DNE auto-split functionality is disabled by default and not fully completed 
(e.g. preserve inode numbers) because it had issues with significant 
performance impact/latency while splitting a directory that was currently in 
use (which is exactly when you would want to use it), so I wouldn't recommend 
to use it at this time.

Instead, development efforts were focussed on DNE MDT space balancing.  This 
adds two different features that allow all of the MDTs in a filesystem to be 
used without user/admin intervention (though it is still possible to manually 
create directories on specific MDTs as before).

The "round-robin" MDT selection ("lfs setdirstripe -D --max-depth-rr=N -c 1 -i 
-1") for top-level directories (enabled for the top 3 levels of the filesystem 
by default) will, as the name suggests, round robin new directories across all 
of the available MDTs, when their space is evenly balanced (within 5% free 
space*inodes by default).  That is important to distribute *new* directories 
across MDTs in new filesystems when e.g. .../home/$user or .../project/$project 
or .../scratch/$user are being created.

The "space balance" MDT selection ("lctl set_param lmv.*.qos_threshold_rr=N" on 
the *CLIENT*) kicks in when MDT space usage becomes imbalanced (free 
space*inodes difference above 5% by default), and then starts selecting the MDT 
for *new* directories based on the ratio of free space*inodes.  That allows the 
MDTs to return toward balance over time, without causing a performance 
imbalance when it isn't necessary.

Note that both of these heuristics operate on *single-stripe directories* and 
not regular files, so the MDT balance will not be perfect if some directory 
tree has millions more files/subdirectories than another.  However, the main 
issue being avoided is the *very* common case of MDT0000 getting full and 
MDT0001..N being (almost) totally unused.  These features also make the MDT 
*usage* balance also pretty good as a result, so it is a win-win.   For most 
filesystems, the MDT capacity is not the limiting factor (it only makes up a 
few percent of the total storage).

Cheers, Andreas

On Mar 23, 2023, at 15:31, Bertschinger, Thomas Andrew Hjorth via 
lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote:

Hello,

We've been experimenting with DNEv3 recently and have run into this issue: 
https://jira.whamcloud.com/browse/LU-7607 where the directory inode number 
changes after auto-split.

In addition to the problem noted with backups that track the inode number, we 
have found that file access through a previously open file descriptor is broken 
post migration. This can occur when a shell's CWD is the affected directory. 
For example:

mds0 # lctl get_param 
mdt.mylustre-MDT0000.{dir_split_count,enable_dir_auto_split}
mdt.mylustre-MDT0000.dir_split_count=100
mdt.mylustre-MDT0000.enable_dir_auto_split=1

client $ pwd
/mnt/mylustre/dnetest
client $ for i in {0..100}; do touch file$i; done
client $ ls
ls: cannot open directory '.': Operation not permitted
client $ ls file0
ls: cannot access 'file0': No such file or directory
client $ ls /mnt/mylustre/dnetest/file0
/mnt/mylustre/dnetest/file0

(This is from a build of the current master branch.)

We believe users will certainly encounter this, because users monitor output 
directories of jobs as they run. Therefore this issue is a dealbreaker with 
DNEv3 for us.

I wanted to ask about the status of the linked issue, since it looks like it 
hasn't been updated in a while. Would the resolution to LU-7607 be expected to 
fix the file access problem I've noted here or will this require additional 
changes to resolve?

Thanks!

- Thomas Bertschinger
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
  • [lustre-discuss] DNE... Bertschinger, Thomas Andrew Hjorth via lustre-discuss
    • Re: [lustre-dis... Andreas Dilger via lustre-discuss

Reply via email to