I'm looking for something to dig back into wc-ng. People should grab this rather than wait for me, but I'll see what I can do in the next few days...
On Tue, Jan 18, 2011 at 10:58, Julian Foad <julian.f...@wandisco.com> wrote: > Can anyone spare a bit of time to take this forward? I'll tell you what > I'm planning to do next. Maybe someone can get to this before I can. > > The first thing I want to do is add a DB index on the 'refcount' column > of the 'pristine' table. I'm convinced that will be necessary for > adequate performance. To do so involves: > > * Adding the 'CREATE INDEX' SQL statement. That's simple - just > copy/modify/paste one line in wc-metadata.sql. > > * Adding a WC format bump step that adds the index to existing dev > WCs. Also quite simple. > > * I'd be happier if I ran a timing comparison, before and after, with > searching for unreferenced pristines (those whose refcount is 0). That > would give me confidence that adding the index has actually happened and > that it has the desired effect. It would need a test with a large > number of pristines, a scattered selection of them having refcount=0, > and then time the running of svn_wc__db_pristine_cleanup(). With an > index, this should find them all very quickly and the deletion time > should dominate the whole cleanup operation, whereas without an index > the time taken to scan the whole table should dominate when the table is > very large. > > > On Thu, 2011-01-13, Branko Čibej wrote: >> On 13.01.2011 20:01, Julian Foad wrote: >> > I have committed the ref counting for pristine texts (r1058523) and have >> > had a bit more insight into when to perform the deletion. >> > >> > Deleting unreferenced texts on closing a "wcroot" is too late - too >> > large a granularity - for the way the WC code is currently structured >> > because this doesn't happen until a (long-lived) pool is cleaned up, as >> > Bert pointed out. >> > >> > Deleting unreferenced texts after running the Work Queue is too soon - >> > too fine a granularity - for the way "commit" is currently structured. >> > A simple test of this approach showed that by the time the post-commit >> > processing tries to install a reference to a pristine text whose >> > checksum was noted earlier, in some cases that pristine row has already >> > been purged. >> >> This would indicate that the reference counting happens too soon ... in >> other words, that a pristine can be dereferenced whilst some part of the >> code (or database) still refers to it. That breaks database consistency >> -- what happens if the user aborts a commit and then calls 'svn >> cleanup', for example? > > If what I said about 'commit' is correct, then yes, that's bad and we > should look for a better way. But I haven't tested it properly; I > noticed that the commit failed saying that the DB failed a 'REFERENCES' > clause, and what I said here is a hypothesis about how that happened. > > Investingating this is the second thing I would want to do next. > > >> > At the moment I think the practical solution is to insert calls to the >> > pristine cleanup at the end of each main libsvn_wc public API that may >> > have freed up one or more pristines. >> >> Mmm. Too many points of failure, don't you think? Of course, it's no >> "worse" than the cancellation handlers, however, and I suppose missing a >> chance to delete is better than having one too many. > > I'm thinking this is nevertheless the best option. The consequence of > missing some places is not too bad at all. If we miss places where we > should clean up, then all it means is that some operations will leak > unused pristines, but they won't be leaked permanently, only until the > user runs one of the operations that do have the cleanup call in them. > That's a consequence of calling a global "clean up all unreferenced > pristines" function rather than trying to keep track of which pristines > have been freed up by *this* operation. > > >> > Wherever we do it, if it's at all frequent, the search for unreferenced >> > pristines needs to be efficient. At present I use "SELECT ... FROM >> > PRISTINE WHERE refcount = 0". I believe that can most easily be made >> > efficient by adding an index on the 'refcount' column, so that's what I >> > intend to do. >> >> That's the only thing you can do, unless you want to enhance the >> triggers to populate a queue of to-be-deleted pristines. > > I agree that adding an index on the table is a better solution than > trying to maintain a separate queue of the pristines that are waiting to > be cleaned up. The index makes it efficient for a simple query > ('refcount = 0') to provide the same information as the separate queue > would provide. > > - Julian > > > >