On Thu, Feb 28, 2013 at 8:04 AM, Magnus Thor Torfason <zulutime....@gmail.com> wrote: > I've been following the discussion about FSFS format7, and had a question: > Is there any chance that the format would improve storage efficiency for > documents that are stored as compressed (zipped) bundles of XML files and > other resource files (Read MS Office Documents, but OpenOffice is similar). > > I'm finding that making very small changes in big documents (with embedded > images) results in rapid growth of the repository, since the binary diff > algorithm seems to not be able to figure out efficient deltas for this type > of documents, even though analysis of the contents shows that they are > almost unchanged.
I don't think it's in the plan at this point. The question I have is how would you imagine that SVN should efficiently store these files? >From the file system layer I don't think there is a good solution to this problem. Since the only way I can see you having efficient storage is to start manipulating the files e.g. decompressing them for storage in the repository. Our file system layer should never start manipulating the content it's storing. The only solution I see to this problem and frankly I don't think it's one we're likely to implement is a client side special handling of certain mime-types. Similar to how we do end of line normalization based on a property, we could decompress these files for storage in the repo and then re-compress them at the client side. That said let me explain why I think we'd not be likely to implement this. 1) This would require special handling of certain file formats, something I don't think we should get into. 2) We might have the dependencies to decompress some formats, but once we go down this road we'd likely need to pull in more and more exotic libraries or we'd have to tell people no we won't support this one format. 3) You'd be saving storage at the expense of using time (read: CPU) on every client that's working with those files when checking out. So the end result may be worse than the current problem. I just don't see this happening unless someone has a very clever idea that I haven't thought of.