Re: Ref-counting for pristine texts

Julian Foad Wed, 26 Jan 2011 10:11:41 -0800

Can anyone help me work out the rules for guaranteeing consistency of
the pristine text store?

On Wed, 2011-01-19, Branko Čibej wrote: 
> On 18.01.2011 16:58, Julian Foad wrote:
> > On Thu, 2011-01-13, Branko Čibej wrote:
> >> This would indicate that the reference counting happens too soon ... in
> >> other words, that a pristine can be dereferenced whilst some part of the
> >> code (or database) still refers to it. That breaks database consistency
> >> -- what happens if the user aborts a commit and then calls 'svn
> >> cleanup', for example?
> >
> > If what I said about 'commit' is correct, then yes, that's bad and we
> > should look for a better way.  But I haven't tested it properly; I
> > noticed that the commit failed saying that the DB failed a 'REFERENCES'
> > clause, and what I said here is a hypothesis about how that happened.
[...]

I've had a chance to look at this now.  What happens now is
svn_client_commit5() does this:

    svn_client__do_commit()
      # Among other things, for each file being committed, this
        installs into the pristine text store a version that will
        later be referred to as the new base version, and returns
        the SHA1 checksum of that.

    if the commit succeeded:
      for each committed item:
        post_process_commit_item(..., item->sha1_checksum)
          # This changes the WC DB to reference the SHA1 checksum
            of the new pristine text.

The problem is that after svn_client__do_commit(), the new pristine
texts are in the store but unreferenced.

We need to decide on the rule that determines at what times the
reference counts can be assumed correct.

  * Always correct outside a DB txn?

  * Correct when the work queue is empty, outside a DB txn?
    (I mention this idea not because it's necessarily sensible, but
simply because "work queue is empty" is already being used as a signal
for some kind of consistency guarantee - and maybe, just maybe, we might
want to use the same criterion.)

  * Some other rule?

At the moment it's "some other rule".  In more than one place, we insert
a new text some time before we run a txn that inserts a reference to it.

If we want to change the rule to "Outside a DB txn", we will need to
ensure that we always insert pristine texts and their references in the
same txn, not separately.  Is that necessary?  Not sure, but I think
it's necessary to have some clear rule, and the rule needs to be based
on a bit (literally) of data that we can test and that is maintained
centrally in the DB itself.

That makes me think - should we be using the work queue for the
file-move part of inserting a pristine text into the store?

As for the overall shape of the commit process... Let's take a
high-level view of the commit process and the responsibilities of
libsvn_wc versus libsvn_client.

  1. CLIENT sets up a "commit editor" that is connected to the repo.

  2. CLIENT asks WC to report the changes that need to be committed,
     driving the commit editor.

  3. If commit succeeded, CLIENT asks WC to bump its metadata for
     each committed item to reflect that what was the "local"/
     "actual"/"modified" state is now the "base" state.

But the current implementation tries to take a short cut between 2. and
3.  Because step 2 involves calculating the new base text (as it may
need translating from the working copy form), it wants to store that
translated result so it can be moved immediately into the pristine store
in step 3.  Another benefit is that it isolates the commit process from
any changes the user may be making to the working file while the commit
and post-commit processing is going on.

In terms of file content changes, the current implementation of (part
of) steps 2 and 3 is:

  2. For each file, WC stores the new base text in the pristine
     store, and returns the checksum of that to the CLIENT.

  3. CLIENT passes down the new base checksum of each file, so the
     WC can simply refer to the new base text that is already stored.
     The WC does not need to re-read, re-translate and re-checksum
     the file's working text.

But alternative, and cleaner, implementations could be

  A. For libsvn_client to take ownership the new (pending) text-base
file in step 2 and then pass it to libsvn_wc in step 3.

  B. For libsvn_wc to calculate the new text-base twice, in steps 2 and
3 independently.  (But that doesn't isolate from user changes.)

  C. For libsvn_client to take ownership of the new (pending) text-base
file in step 2, but not in the pristine store but tracked somehow else.
(Through an opaque data baton passed back to the client for passing to
step 3?)

  D. For libsvn_client to take ownership of the new (pending) text-base
file in step 2, *in* the pristine store but with a proper reference
within the DB, in some new column and/or some new table. ???

Thoughts?

- Julian

> [...]  If the references in the database are maintained
> correctly, then it should be safe to delete after each successful
> transaction commit.

> That's assuming that any high-level operation involves a single
> transaction, e.g., that a commit can't fail in some way that would
> require the pristine to still be in place in order to recover, even if
> the reference count in the database is 0.
> 
> -- Brane
>

Re: Ref-counting for pristine texts

Reply via email to