Alan Curry wrote: ... > By comparison to a proper tool which doesn't do any unnecessary traversals of > extra directories, your use of du is slow and brittle (if the user forgets > an alternate directory containing a link, the result is wrong) and has only > the slight advantage of already being implemented. > > Here's a working outline of the single-traversal method. I wouldn't suggest > that du should contain equivalent code. A single-purpose perl script, even > without pretty output formatting, feels clean enough to me. Since I've gone > to the trouble (not much) of writing it, I'll keep it as ~/bin/predict_rm_rf > for future use. > > #!/usr/bin/perl -W > use strict; > use File::Find; > > @ARGV or die "Usage: $0 directory [directory ...]\n"; > > my $total = 0; > my %pending = (); > > File::Find::find({wanted => sub { > my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12]; > if(-d _ || $nlink==1) { > $total += $blocks; > return; > } > if($nlink == ++$pending{"$dev.$ino"}) { > delete $pending{"$dev.$ino"}; > $total += $blocks; > } > }}, @ARGV); > > print "$total blocks would be freed by rm -rf @ARGV\n";
That seems useful. However, the number it prints is too large whenever it processes a file or directory more than $nlink times, e.g., when invoked as predict_rm_rf F F it prints double the correct number. To account for that, the script must record every dev/ino pair it processes, say via: File::Find::find({wanted => sub { my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12]; defined $pending{"$dev.$ino"} && $pending{"$dev.$ino"} < 0 and return; if(-d _ || $nlink==1 || $nlink == ++$pending{"$dev.$ino"}) { $total += $blocks; $pending{"$dev.$ino"} = -1; return; } }}, @ARGV); Note that for a large tree, the perl code will be far less efficient than C code like du because: - the perl script must call lstat for every single entry (du can use dirent.d_ino on some file systems). When I checked about a year ago, Perl still had no good way to get something like dirent.d_ino. - du uses a compact representation for a device/inode pair, so may use a lot less memory.