Populating the rep-cache

Philip Martin Wed, 27 May 2015 11:16:12 -0700

Julian Foad <julianf...@gmail.com> writes:

> Stefan Fuhrmann wrote:
>> * clear the rep-cache.db
>
> Clearing the cache and continuing operation may make subsequent
> commits much larger than they should be, and there is no easy way to
> undo that if it happens.


I've been thinking of writing some code to populate the rep-cache from
existing revisions.  This code would parse the revision, a bit like
verify, identify checksums in that revision and add any that are found
to the rep-cache.  This would be time consuming if run on the whole
repository but would run perfectly well in a separate process while the
repository remains live.  It could also be run over a revision range
rather than just the whole repository, and running on a single revision
such as HEAD would be fast.

I believe the code will be relative straightforward, if anything it is
the API that is more of a problem.

 - We could add a public svn_fs_rep_cache().  This is backend specific
   but there is precedent: we have svn_fs_berkeley_logfiles() and
   svn_fs_pack().

 - We could add a more general svn_fs_optimize().  This would do backend
   specific optimizations that may change in future versions.  Perhaps
   passing backend-specific flags?

 - We could add the behaviour to svn_fs_recover() by reving the function
   with a revision range.  This would "recover" the rep-cache after the
   existing recovery.  At present recover is fast so to preserve that
   the compatibility function would pass a revision range that is just
   HEAD.

 - We could avoid a public API and call some FSFS function from svnfsfs.

I'll probably go with the last option initially.  Any comments?

I should note that WANdisco has an interest in this code being
developed.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Populating the rep-cache

Reply via email to