Paul Hammant <p...@hammant.org> writes:

> Git doesn't store deltas, and uses a DEFLATE algorithm for
> storage. Diffs are meaningless on binary files, of course.

I don't know about git but Subversion does quite a good job on some
binary files.  Take the compressed tarballs of a couple of Subversion
tags:

   $ svn export http://svn.apache.org/repos/asf/subversion/tags/1.9.5
   $ svn export http://svn.apache.org/repos/asf/subversion/tags/1.9.4
   $ tar cfz foo1.tar.gz 1.9.5
   $ tar cfz foo2.tar.gz 1.9.4
   $ svnadmin create repo
   $ svnmucc -mm -U file://`pwd`/repo put foo1.tar.gz f.tgz
   $ svnmucc -mm -U file://`pwd`/repo put foo2.tar.gz f.tgz

How big are the tarballs?

   $ ls -lh foo*
   -rw-r--r-- 1 pm pm 15M Aug  4 13:00 foo1.tar.gz
   -rw-r--r-- 1 pm pm 15M Aug  4 13:00 foo2.tar.gz

How big in the repository?

   $ ls -lh repo/db/revs/0/[12]
   -r--r--r-- 1 pm pm 15M Aug  4 13:00 repo/db/revs/0/1
   -r--r--r-- 1 pm pm 13M Aug  4 13:00 repo/db/revs/0/2

Saving about 2M. But we can do better if we do compression knowing that
deltification will be used:

   $ tar cf foo1.tar 1.9.5
   $ tar cf foo2.tar 1.9.4
   $ gzip --rsyncable foo1.tar
   $ gzip --rsyncable foo2.tar

The resulting tarballs are little bigger:

   -rw-r--r-- 1 pm pm 16M Aug  4 13:05 foo1.tar.gz
   -rw-r--r-- 1 pm pm 16M Aug  4 13:05 foo2.tar.gz

but Subversion can do better deltification:

   -r--r--r-- 1 pm pm  16M Aug  4 13:05 repo/db/revs/0/1
   -r--r--r-- 1 pm pm 5.6M Aug  4 13:05 repo/db/revs/0/2

We have stored two 15MB compressed tarballs in a 21MB repository.

-- 
Philip

Reply via email to