Thoughts about eliminating reachability races

Michael Haggerty Sat, 11 May 2013 02:47:50 -0700

How to get rid of reachability races?

This email is meant more as a basis for discussion (including at the
GitMerge conference that is currently running) than as a fully-baked
proposal.


A lot of reachability-related races have been discussed recently:

* If a reference is renamed while a prune is reading the loose
  references, it could be that prune sees neither the old nor the new
  version of the reference and could prune the objects referred to by
  the reference.

* If a process is building a tree, it could find that some objects are
  already in the DB, in which case it will refer to the existing
  objects.  But these objects might be unreachable, and could be
  pruned before the process gets around to creating a root reference
  that makes them reachable.

* Other races involving packed refs vs. loose refs.

These races are most critical for busy hosting sites like GitHub but
could theoretically impact anybody.

It has been discussed that we should offer, as an alternative to
packed-refs/loose refs in the filesystem, a way to store refs in a
single database store of some kind that offers ACID guarantees.  This
would solve the first and last kind of race in one step.

Other mechanisms are being discussed to address the second race, maybe
involving moving pruned objects into a kind of purgatory for a period
of time before they are finally deleted.

But I have the feeling that these races could also be solved via the
use of the same database.

I am thinking of a scheme in which the database stores two more things
during prunes:

1. The names of objects that are being pruned.  Such objects are
   considered deleted (even though they might not yet have been
   deleted from the object database) and no new references may be made
   to them.

2. The names of objects to which new links have been added.  Prune is
   not allowed to delete this objects or objects to which they refer.

like those used for incremental garbage
collection of memory, in which *during* a garbage collection the VM
keeps tracks of new references that are made to objects that are in
the generation that is currently being garbage collected.  The garbage
collector has to consider these links in its reachability analysis.

Our situation is a little bit more complicated, because it is not easy
to inform a git process that a prune has been started.  At best, the
process can check occasionally whether a prune is running.  On the
other hand, it *is* permissible for Git processes to fail in
exceptional circumstances, as long as it happens rarely and leaves the
object store in a valid state.

So I'm thinking of a scheme in which

1. prune has to list "doomed" objects to the database before deleting
   them.  Such objects are considered deleted by other processes (*if*
   they know that a prune is running).

2. If a process knows that a prune is running, then it records to the
   database any new links that it makes to existing objects.  Prune is
   not allowed to prune such objects.

3. To catch the situation that a process didn't know that a prune is
   running, we keep a prune generation number.  If that number changes
   between the beginning/end of a process, then the process fails
   without recording any new references.


Design goals:

* Correctness

* "Normal processes" don't block on each other or on prune.

* Normal processes don't get significantly slower in the usual case
  (but are allowed to get slower during pruning).


Assumptions:

* There is a store with atomic transactions.

* Most normal processes run more quickly than a prune.  It will be
  rare for an entire prune to run while a normal process is running
  (though even if it does we must behave correctly).

* Normal processes are allowed to fail occasionally as long as they
  leave the DB in a valid state.


In the store, record the following values:

* Git references -- (refname, object_name) pairs

* prune_in_progress (boolean) -- a destructive operation is running

* prune_generation (integer) -- incremented each time objects are
  invalidated

* doomed_objects -- a list of the names of objects that a prune is in
  the process of deleting.  These objects *are considered deleted* and
  may not be referred to.  Only used when prune_in_progress.

* new_links -- a list of the names of objects newly created or newly
  referenced during a prune.  These objects are considered referenced
  and *may not be pruned*.  Only used when prune_in_progress.



A nondestructive operation will usually have two interactions with the
store.  At start::

    def record_new_link(sha1):
        """Record that there is a new link to object sha1."""

        if prune_in_progress:
            if sha1 in doomed_objects:
                die()
            append sha1 to new_links (RAM copy)


    with transaction:
        read prune_generation
        read prune_in_progress
        if prune_in_progress:
            read doomed_objects
        read any needed Git refs

    for each new object or new link to an old object:
        record_new_link(object_name)

    with transaction:
        if prune_generation has changed:
            die()
        if prune_in_progress:
            read current doomed_objects:
            if any of new_links are in doomed_objects:
                die()
            append list of new git references to new_links
        write changes to git references

    # From this moment on, the new_links are considered to be
    # reachable and neither they nor any objects that they refer to
    # may be pruned.



A destructive operation will have at least three interactions with the
store::

    with transaction:
        if prune_in_progress:
            abort
        prune_in_progress = True
        clear doomed_objects list
        clear new_links list
        known_links = list of all object references

    doomed_objects = compute list of objects unreachable from known_links

    while True:
        with transaction:
            write current version of doomed_objects
            read new_links from store
            clear new_links in store

        # From this moment on, objects in doomed_objects are
        # considered deleted.

        if new_links is empty:
            break
        else:
            remove objects reachable from new_links from doomed_objects

    delete doomed objects from filesystem

    with transaction:
        clear list of doomed objects
        increment stored value of prune_generation
        prune_in_progress = False


I hope that's clear; please poke holes in it.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thoughts about eliminating reachability races

Reply via email to