Erik Huelsmann <[email protected]> writes:
> As the others, I'm surprised we seem to be going with a custom file format.
> You claim source files are generally small in size and hence only small
> benefits can be had from compressing them, if at all, due to the fact that
> they would be of sub-block size already.
I was surprised too, so I looked at GCC where a trunk checkout has
75,000 files of various types:
$ find .svn/pristine -type f | wc -l
75192
Uncompressed:
$ du -hs .svn/pristine
635M .svn/pristine
$ find .svn/pristine -type f | xargs ls -ls | awk '{tot += $1} END {print tot}'
641536
Individually compressed is smaller by a factor of 2:
$ find .svn/pristine -type f | xargs gzip
$ du -hs .svn/pristine
367M .svn/pristine
$ find .svn/pristine -type f | xargs ls -ls | awk '{tot += $1} END {print tot}'
365624
As one single file is smaller by another factor of 3:
$ find .svn/pristine -type f | xargs cat >> one-big-file
$ du -hs one-big-file
122M one-big-file
$ ls -ls one-big-file | awk '{print $1}'
124516
When individually compressed most of the 75,000 files are less
than 4K:
$ find .svn/pristine -size -4096c | wc -l
71571
more than half are less than 1K:
$ find .svn/pristine -size -1024c | wc -l
53707
and nearly half are less than 0.5K:
$ find .svn/pristine -size -512c | wc -l
36521
In the uncompressed state:
62323 are less than 4K
36648 are less than 1K
21828 are less than 0.5K
Maybe GCC is not typical but, rather to my surprise, combining the
compressed files would be a significant improvement.
I also have an httpd trunk checkout (needs cleanup so bigger than
normal):
90M uncompressed
37M individually compressed
23M as one big file
That's more like your figures for Subversion where the major step is
individual compression.
--
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com