On Thu, May 28, 2015 at 6:00 PM, Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> wrote: > On Wed, May 27, 2015 at 8:14 PM, Philip Martin <philip.mar...@wandisco.com> > wrote: >> >> Julian Foad <julianf...@gmail.com> writes: >> >> > Stefan Fuhrmann wrote: >> >> * clear the rep-cache.db >> > >> > Clearing the cache and continuing operation may make subsequent >> > commits much larger than they should be, and there is no easy way to >> > undo that if it happens. >> >> I've been thinking of writing some code to populate the rep-cache from >> existing revisions. This code would parse the revision, a bit like >> verify, identify checksums in that revision and add any that are found >> to the rep-cache. This would be time consuming if run on the whole >> repository but would run perfectly well in a separate process while the >> repository remains live. It could also be run over a revision range >> rather than just the whole repository, and running on a single revision >> such as HEAD would be fast. > > > Makes sense. > >> >> I believe the code will be relative straightforward, if anything it is >> the API that is more of a problem. >> >> - We could add a public svn_fs_rep_cache(). This is backend specific >> but there is precedent: we have svn_fs_berkeley_logfiles() and >> svn_fs_pack(). >> >> - We could add a more general svn_fs_optimize(). This would do backend >> specific optimizations that may change in future versions. Perhaps >> passing backend-specific flags? > > > I think svn_fs_optimize(bool online) would make sense > in the longer term. > > In the "offline" case, it could do anything from removing > duplicate reps as we build the cache to sharding repos > or repacking shards. Not that I would want to implement > any of that soon.
I was wondering about that too. I think repopulating the rep-cache (without the need to take the repos offline) is very interesting, but I immediately think: functionality to repopulate the rep-cache *and* (optionally) rewrite rev files to let them use rep sharing (i.e. effectively deduplicating the repository) ... that would be even better. But big +1 on the initial idea already for offering the ability to rebuild a broken rep-cache (without having to dump/load). -- Johan