On Thu, Feb 28, 2013 at 8:37 AM, Ben Reser <b...@reser.org> wrote: > I just don't see this happening unless someone has a very clever idea > that I haven't thought of.
Speaking with Julian here at ApacheCon he mentioned that gzip has a rsyncable option. Looking into this turns out that there is a patch applied to Debian's gzip that provides this option. It resets the compression algorithm every 1000 bytes and thus makes blocks that can be saved between revisions of the file. gzip uses the same DEFLATE algorithm that most zip files use, so the same idea could be applied to it. If we want to deal with something like this in Subversion, I think we'd deal with it via some sort of plugin for specific file types that could convert to the more efficient to deltify encoding before committing. Unfortunately, we don't have any sort of plugin type infrastructure for this today. Even still there are things that can be done today. I made a couple trivial Microsoft Office Word documents. One with the characters "abc" in them and one with "abcdef" in it. I saved the files in .docx and in the 2003 flat XML format. The .docx file produced a delta of 3262 bytes, the .xml format produced a file with a delta of just 358 bytes. OpenOffice/LibreOffice support flat versions of their format (e.g. .fodt) that are not compressed and can also be more efficiently stored in Subversion. LibreOffice even has a wiki about this: https://wiki.documentfoundation.org/Libreoffice_and_subversion