Benjamin R. Haskell
Sat, 06 Feb 2010 14:46:59 -0800
On Sat, 6 Feb 2010, Tony Abernethy wrote: > Grarpamp wrote: > > Yes, hardlinks save data block duplication... yet on filesystems > > with millions of files / long pathnames, just the directory entries > > alone can take up gigs per image. Multiply that out by frequency and > > it can quickly add up. > > Huh? >
That seemed excessive to me, too. In a short test it seems accurate.
I tried the following (in Zsh):
mkdir /tmp/rsync-test
cd /tmp/rsync-test
for l in {00..99}/{00..99}/{00..99} ; do mkdir -p $l ; done
And, it's a slow process on my machine, but each set of 10,000
directories seems to add about 40MB on an ext3 filesystem.
E.g. After the '00/{00..99}/{00..99}' directories existed, 'du -sh' showed
~40MB. After '{00..01}/{00..99}/{00..99}' were done, ~80MB.
As I write this, after '{00..04}/{00..99}/{00..99}' are done, ~199MB.
I'm not sure the path lengths add any overhead (unless you just meant as
a result of having more directories). Each dir adds 4K on my system
(though I know larger directories add multiples of that block size).
So, with millions of files, yes, the directory entries alone could
add up to gigs of space.
Interestingly, the same test on reiserfs (which I tend to use for fs'es
with many small files) seems to show about 200 *K* per 10,000
directories (about 72 bytes per directory). It seems pretty highly
dependent on choice of fs.
But, that overhead is incurred with --link-dest, too. Even without any
changes:
for l in original/00/{00.99}/{00..99} ; do
mkdir -p $l
touch $l/file
done
rsync -av original/ firstbackup/
rsync -av --link-dest=`pwd`/original original/ secondbackup/
du -sh original firstbackup secondbackup
40M original
40M firstbackup
40M secondbackup
--
Best,
Ben
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html