Erik Huelsmann <ehu...@gmail.com> writes: > As the others, I'm surprised we seem to be going with a custom file format. > You claim source files are generally small in size and hence only small > benefits can be had from compressing them, if at all, due to the fact that > they would be of sub-block size already.
I was surprised too, so I looked at GCC where a trunk checkout has 75,000 files of various types: $ find .svn/pristine -type f | wc -l 75192 Uncompressed: $ du -hs .svn/pristine 635M .svn/pristine $ find .svn/pristine -type f | xargs ls -ls | awk '{tot += $1} END {print tot}' 641536 Individually compressed is smaller by a factor of 2: $ find .svn/pristine -type f | xargs gzip $ du -hs .svn/pristine 367M .svn/pristine $ find .svn/pristine -type f | xargs ls -ls | awk '{tot += $1} END {print tot}' 365624 As one single file is smaller by another factor of 3: $ find .svn/pristine -type f | xargs cat >> one-big-file $ du -hs one-big-file 122M one-big-file $ ls -ls one-big-file | awk '{print $1}' 124516 When individually compressed most of the 75,000 files are less than 4K: $ find .svn/pristine -size -4096c | wc -l 71571 more than half are less than 1K: $ find .svn/pristine -size -1024c | wc -l 53707 and nearly half are less than 0.5K: $ find .svn/pristine -size -512c | wc -l 36521 In the uncompressed state: 62323 are less than 4K 36648 are less than 1K 21828 are less than 0.5K Maybe GCC is not typical but, rather to my surprise, combining the compressed files would be a significant improvement. I also have an httpd trunk checkout (needs cleanup so bigger than normal): 90M uncompressed 37M individually compressed 23M as one big file That's more like your figures for Subversion where the major step is individual compression. -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com