Can anyone help me work out the rules for guaranteeing consistency of the pristine text store?
On Wed, 2011-01-19, Branko Čibej wrote: > On 18.01.2011 16:58, Julian Foad wrote: > > On Thu, 2011-01-13, Branko Čibej wrote: > >> This would indicate that the reference counting happens too soon ... in > >> other words, that a pristine can be dereferenced whilst some part of the > >> code (or database) still refers to it. That breaks database consistency > >> -- what happens if the user aborts a commit and then calls 'svn > >> cleanup', for example? > > > > If what I said about 'commit' is correct, then yes, that's bad and we > > should look for a better way. But I haven't tested it properly; I > > noticed that the commit failed saying that the DB failed a 'REFERENCES' > > clause, and what I said here is a hypothesis about how that happened. [...] I've had a chance to look at this now. What happens now is svn_client_commit5() does this: svn_client__do_commit() # Among other things, for each file being committed, this installs into the pristine text store a version that will later be referred to as the new base version, and returns the SHA1 checksum of that. if the commit succeeded: for each committed item: post_process_commit_item(..., item->sha1_checksum) # This changes the WC DB to reference the SHA1 checksum of the new pristine text. The problem is that after svn_client__do_commit(), the new pristine texts are in the store but unreferenced. We need to decide on the rule that determines at what times the reference counts can be assumed correct. * Always correct outside a DB txn? * Correct when the work queue is empty, outside a DB txn? (I mention this idea not because it's necessarily sensible, but simply because "work queue is empty" is already being used as a signal for some kind of consistency guarantee - and maybe, just maybe, we might want to use the same criterion.) * Some other rule? At the moment it's "some other rule". In more than one place, we insert a new text some time before we run a txn that inserts a reference to it. If we want to change the rule to "Outside a DB txn", we will need to ensure that we always insert pristine texts and their references in the same txn, not separately. Is that necessary? Not sure, but I think it's necessary to have some clear rule, and the rule needs to be based on a bit (literally) of data that we can test and that is maintained centrally in the DB itself. That makes me think - should we be using the work queue for the file-move part of inserting a pristine text into the store? As for the overall shape of the commit process... Let's take a high-level view of the commit process and the responsibilities of libsvn_wc versus libsvn_client. 1. CLIENT sets up a "commit editor" that is connected to the repo. 2. CLIENT asks WC to report the changes that need to be committed, driving the commit editor. 3. If commit succeeded, CLIENT asks WC to bump its metadata for each committed item to reflect that what was the "local"/ "actual"/"modified" state is now the "base" state. But the current implementation tries to take a short cut between 2. and 3. Because step 2 involves calculating the new base text (as it may need translating from the working copy form), it wants to store that translated result so it can be moved immediately into the pristine store in step 3. Another benefit is that it isolates the commit process from any changes the user may be making to the working file while the commit and post-commit processing is going on. In terms of file content changes, the current implementation of (part of) steps 2 and 3 is: 2. For each file, WC stores the new base text in the pristine store, and returns the checksum of that to the CLIENT. 3. CLIENT passes down the new base checksum of each file, so the WC can simply refer to the new base text that is already stored. The WC does not need to re-read, re-translate and re-checksum the file's working text. But alternative, and cleaner, implementations could be A. For libsvn_client to take ownership the new (pending) text-base file in step 2 and then pass it to libsvn_wc in step 3. B. For libsvn_wc to calculate the new text-base twice, in steps 2 and 3 independently. (But that doesn't isolate from user changes.) C. For libsvn_client to take ownership of the new (pending) text-base file in step 2, but not in the pristine store but tracked somehow else. (Through an opaque data baton passed back to the client for passing to step 3?) D. For libsvn_client to take ownership of the new (pending) text-base file in step 2, *in* the pristine store but with a proper reference within the DB, in some new column and/or some new table. ??? Thoughts? - Julian > [...] If the references in the database are maintained > correctly, then it should be safe to delete after each successful > transaction commit. > That's assuming that any high-level operation involves a single > transaction, e.g., that a commit can't fail in some way that would > require the pristine to still be in place in order to recover, even if > the reference count in the database is 0. > > -- Brane >