Mikhael Goikhman <[EMAIL PROTECTED]> writes:

>   % revision=archzoom--devel--0--patch-300
>   % cd `tla library-find $revision`/..
>   % tar cf - --exclude $revision/,,patch-set --exclude $revision/,,index \
>     --exclude $revision/,,index-by-name $revision | gzip -9 >$revision.tar.gz
>   % du -s --block-size=1 $revision
>   % ls -s --block-size=1 $revision.tar.gz
>   3403776 archzoom--devel--0--patch-300
>   163840 archzoom--devel--0--patch-300.tar.gz
>
> The ratio is 21. There is a small, but increasing gain when compared with
> earlier revisions (18), in particular because {arch} contains a lot of
> small files that are compressed nicely. Probably better than hardlinking.

You're comparing the size of a *single* revision directory against
tar+gz.  This doesn't make much sense since, by definition, the hard
link trick compresses data *across* several revisions.

> Please don't forget that a hardlink costs more than 0,

Can you elaborate on that?

> and also that for
> every merged external revision there are at least 2 more files, in {arch}
> and ,,patch-log/, and possibly new subdirs too (not hardlink-able).

Right.

> For me (and for du/rm) it is not the size, but number of inodes that is
> more important, so this very CPU expensive solution would not solve much.

There are several good papers on the topic [0,1,2].  I'm pretty
confident that hard link + gzip of individual files would yield a better
compression ratio than keeping several whole revision tarballs, *when*
several subsequent revisions are kept.

Thanks,
Ludovic.

[0] http://ssrc.cse.ucsc.edu/Papers/you-mss04.pdf
[1] http://ssrc.cse.ucsc.edu/Papers/you-icde05.pdf
[2] 
http://www.usenix.org/events/usenix04/tech/general/full_papers/kulkarni/kulkarni_html/paper.html


_______________________________________________
Gnu-arch-users mailing list
Gnu-arch-users@gnu.org
http://lists.gnu.org/mailman/listinfo/gnu-arch-users

GNU arch home page:
http://savannah.gnu.org/projects/gnu-arch/

Reply via email to