Hi all, A quick summary first: we’ve found that for certain use cases - increasing SVN_DELTA_WINDOW_SIZE yields very significant storage savings on the size of the repo (~10x). We have a POC patch to make it configurable (currently only via fsfs.conf and using libsvn_ra_svn and libsvn_ra_local) and would be interested in working with the community to see if the changes could be improved, with the ultimate goal to have them accepted into the mainline.
I have found some previous discussions on this from a good while back: https://svn.haxx.se/users/archive-2008-02/0547.shtml. but not much else. If there’s any additional information on this problem that I may have missed - please feel free to educate me! Otherwise if there are no glaring issues that anyone can see in supporting this - I’d be happy to look into cleaning up my POC patch and post it for detailed review. Obviously, there are trade-offs in doing this: A repo would most likely need to use a specified window size throughout its lifetime, and would not be beneficial for every use-case. This is why we propose to keep it as a config option. Some more details of our use-case and some rough numbers that illustrate the benefits follow. Our use case is that we commonly have very large files that see small changes over time, and a large xdelta window size would benefit such use-cases greatly. For us - the growth pace of our repositories is quite staggering due to this amplification, and we’d be more than happy to trade memory usage (especially, but also processing time to an extent) to be able to keep this in check. We’ve not done extensive measurements on how this impacts runtime yet. To demonstrate the effect - we generate a random and fairly large (~1.3 G, ~30 million lines) XML file, commit it, then make random changes to it (from 300 lines to ~1% of lines), committed those, and then looked at the file size generated by the commit in repo/db/revs/. We then repeat this, but with configuring a 100x larger window size (10240000). The most dramatic results are for small changes (this is somewhat intuitive) where the size of the revision with changes is ~40x smaller (10k vs 430k for a 1.3Gig file) when using a larger window size. For different patterns of changes the difference is not that dramatic but still large (5-10x). I am happy to share more details if people are interested. We believe we’d see a lot of benefit from this option, as would others in the community, and are very much committed to Subversion in the long run, so would love to hear what people think about something like this. Many thanks, Nikola Dipanov Hudson River Trading