On Tue, Dec 01, 2009 at 09:28:50AM -0500, Jeffrey J. Kosowsky wrote:
> Pieter Wuille wrote at about 13:18:33 +0100 on Tuesday, December 1, 2009:
>  > What you can do is count the allocated space for each directory and file, 
> but
>  > divide the numbers for files by (nHardlinks+1). This way you end up
>  > distributing the size each file takes on disks over the different backups 
> it
>  > belongs to.
>  > 
>  > I have a script that does this; if there's interest i'll attach it. It does
>  > take a day (wild guess, never accurately measured) to go over all pc/*
>  > directories (Pool is 370.65GB comprising 4237093 files and 4369
>  > directories)
> 
> I am surprised that it would take a day.
The server is quite busy making backups, and rsync'ing to an offsite backup
server at the same time -- especially the latter puts some serious load on 
I/O, i assume.

> The only real cost should be that of doing a 'find' and a 'stat' on
> the pc tree - which I would do in perl so that I could do the
> arithmetic in place (rather than having to use a *nix find -printf to
> pass it off to another program).
Yes, it is a perl script.

> Unless you have a huge number of pc's and backups, I can't imagine
> this would take more than a couple of hours since your total number of
> unique files in only about 4 million.
We have 4 million unique inodes. We do however have some 20-25 million
directory entries, which is what the script needs to read through.

> Given that you only have 4 million unique files, you could even avoid
> the multiple stats at the cost of that much memory by caching the
> nlinks and size by inode number.
Except that the script already needs to do a stat per directory entry in order
to know the inode number itself...

> Can you post your script?

See attachment. You can run eg.:

   ./diffsize.pl /var/lib/backuppc/pc/*

to see values per host, and a total.

PS: it actually (correctly) divides by (nHardLinks-1) instead of +1 (what i
claimed earlier).

kind regards,

-- 
Pieter
#!/usr/bin/perl -w

use strict;
use Fcntl ':mode';

my $BASELINK=1;

my %tsize;

$|=1;

foreach my $path (@ARGV) {
  my @todo = ($path);
  my %size;
  while ($#todo >= 0) {
    my $cur = pop @todo;
    my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = stat $cur;
    if (S_ISDIR($mode)) {
      $size{dalloc} += $blocks*512;
      $size{alloc} += $blocks*512;
      $size{dsize} += $size;
      opendir DIR,$cur;
      push @todo,map { $size{dentries}++; $cur . '/' . $_ } (grep { $_ ne '.' && $_ ne '..'} (readdir DIR));
      closedir DIR;
    } elsif (S_ISREG($mode)) {
      $nlink -= $BASELINK if ($nlink>$BASELINK);
      $size{falloc} += $blocks*512/$nlink;
      $size{alloc} += $blocks*512/$nlink;
      $size{fsize} += $size/$nlink;
      $size{fcount} += 1/$nlink;
    }
  }
  foreach my $key (keys %size) { $tsize{$key} += $size{$key}; }
  print "$path: ".(join ' ',(map { $_."=".$size{$_} } (sort keys %size)))."\n";
}
print "total: ".(join ' ',(map { $_."=".$tsize{$_} } (sort keys %tsize)))."\n";
------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to