Hi all,

A quick summary first: we’ve found that for certain use cases - increasing
SVN_DELTA_WINDOW_SIZE  yields very significant storage savings on the size
of the repo (~10x). We have a POC patch to make it configurable (currently
only via fsfs.conf and using libsvn_ra_svn and libsvn_ra_local) and would
be interested in working with the community to see if the changes could be
improved, with the ultimate goal to have them accepted into the mainline.

I have found some previous discussions on this from a good while back:
https://svn.haxx.se/users/archive-2008-02/0547.shtml. but not much else. If
there’s any additional information on this problem that I may have missed -
please feel free to educate me!

Otherwise if there are no glaring issues that anyone can see in supporting
this - I’d be happy to look into cleaning up my POC patch and post it for
detailed review.

Obviously, there are trade-offs in doing this: A repo would most likely
need to use a specified window size throughout its lifetime, and would not
be beneficial for every use-case. This is why we propose to keep it as a
config option. Some more details of our use-case and some rough numbers
that illustrate the benefits follow.

Our use case is that we commonly have very large files that see small
changes over time, and a large xdelta window size would benefit such
use-cases greatly. For us - the growth pace of our repositories is quite
staggering due to this amplification, and we’d be more than happy to trade
memory usage (especially, but also processing time to an extent) to be able
to keep this in check. We’ve not done extensive measurements on how this
impacts runtime yet.

To demonstrate the effect - we generate a random and fairly large (~1.3 G,
~30 million lines) XML file, commit it, then make random changes to it
(from 300 lines to ~1% of lines), committed those, and then looked at the
file size generated by the commit in repo/db/revs/.  We then repeat this,
but with configuring a 100x larger window size (10240000). The most
dramatic results are for small changes (this is somewhat intuitive) where
the size of the revision with changes is ~40x smaller (10k vs 430k for a
1.3Gig file) when using a larger window size. For different patterns of
changes the difference is not that dramatic but still large (5-10x). I am
happy to share more details if people are interested.

We believe we’d see a lot of benefit from this option, as would others in
the community, and are very much committed to Subversion in the long run,
so would love to hear what people think about something like this.

Many thanks,

Nikola Dipanov
Hudson River Trading

Reply via email to