Author: julianfoad
Date: Thu Feb 17 17:21:06 2011
New Revision: 1071707

URL: http://svn.apache.org/viewvc?rev=1071707&view=rev
Log:
* notes/wc-ng/pristine-store
  New file, as first presented for review in the email thread started
  on 2011-02-15, subject 'Pristine store - spec'.

Added:
    subversion/trunk/notes/wc-ng/pristine-store

Added: subversion/trunk/notes/wc-ng/pristine-store
URL: 
http://svn.apache.org/viewvc/subversion/trunk/notes/wc-ng/pristine-store?rev=1071707&view=auto
==============================================================================
--- subversion/trunk/notes/wc-ng/pristine-store (added)
+++ subversion/trunk/notes/wc-ng/pristine-store Thu Feb 17 17:21:06 2011
@@ -0,0 +1,206 @@
+THE PRISTINE STORE
+==================
+
+=== Introduction ===
+
+The Pristine Store is the part of the Working Copy metadata that holds
+a local copy of the full text of the base version of each WC file.
+
+Texts in the Pristine Store are addressed only by their SHA-1 checksum.
+The Pristine Store does not track which text relates to which repository
+and revision and path.  The Pristine Store does not hold pristine copies
+of directories, nor of properties.
+
+The Pristine Store data is held in
+ * the 'PRISTINE' table in the SQLite Data Base (SDB), and
+ * the files in the 'pristine' directory.
+
+This specification uses SDB transactions to ensure the consistency of
+writes and reads.
+
+==== Invariants ====
+
+The operating procedures below maintain the following invariants.
+These invariants apply at all times except within the SDB txns defined
+below.
+
+* Each row in the PRISTINE table has an associated pristine text file
+  that is not open for writing and is available for reading and whose
+  content matches the columns 'size', 'checksum', 'md5_checksum'.
+
+==== Operating Procedures ====
+
+The steps should be carried out in the order specified.  (See rationale.)
+
+* To add a pristine, do the following inside an SDB txn:
+     * Add the table row, and set the refcount as desired.  If a row
+       already exists, add the desired refcount to its refcount, and
+       preferably verify the old row matches the new metadata.
+     * Create the file. Creation should be fs-atomic, e.g. by moving a
+       new file into place, so as never to orphan a partial file.  If a
+       file already exists, preferably leave it rather than replace it,
+       and optionally verify it matches the new metadata (e.g. length).
+
+* To remove a pristine, do the following inside an SDB txn:
+     * First, check refcount == 0, and abort if not.
+     * Delete the table row.
+     * Delete the file or move it away. (If not present, log a
+       consistency error but, in a release build, return success.)
+
+* To query a pristine's existence or SDB metadata, the reader must:
+     * Ensure no pristine-remove txn is in progress while querying it.
+
+* To read a pristine text, the reader must:
+     * Ensure no pristine-remove txn is in progress while querying and
+       opening it.
+     * Ensure the pristine text remains in the store continuously from
+       opening it for the duration of the read. (Perhaps by ensuring
+       refcount remains >= 1 and/or by cooperating with the clean-up
+       code.)
+
+==== Rationale ====
+
+* Adding a pristine:
+     * We can't add the file *before* the SDB txn takes out a lock,
+       because that would leave a gap in which another process could
+       see this file as an orphan and delete it.
+     * Within the txn, the table row could be added after creating the
+       file; it makes no difference as it will not become externally
+       visible until commit.  But then we would have to take out a lock
+       explicitly before adding the file.  Adding the row takes out a
+       lock implicitly, so doing it first avoids an extra step.
+     * Leaving an existing file in place is less likely to interfere with
+       processes that are currently reading from the file.  Replacing it
+       might also be acceptable, but that would need further
+       investigation.
+
+* Removing a pristine:
+     * We can't remove the file *after* the SDB txn that updates the
+       table, because that would leave a gap in which another process
+       might re-add this same pristine file and then we would delete it.
+     * Within the txn, the table row could be removed after creating the
+       file, but see the rationale for adding a pristine.
+     * In a typical use case for removing a pristine text, the caller
+       would check the refcount before starting this txn, but
+       nevertheless it may have changed and so must be checked again
+       inside the txn.
+
+* In the add and remove txns, we need to acquire an SDB 'RESERVED'
+  lock before adding or removing the file.  This can be done by starting
+  the txn with 'BEGIN IMMEDIATE' and/or by performing an SDB write (such
+  as the table row update).  ### Would a 'SHARED' lock be sufficient,
+  and if so would it be noticably better?
+
+==== Notes ====
+
+* This procedure can leave orphaned pristine files (files without a
+  corresponding SDB row) if Subvsersion crashes.  The Pristine Store
+  will still operate correctly.  It should be easy to teach "svn cleanup"
+  to safely delete these.  ### Do we need to define the clean-up
+  procedure here?
+
+* This specification is conceptually simple, but requires completing disk
+  operations within SDB transactions, which may make it too inefficient
+  in practice.  An alternative specification could use the Work Queue to
+  enable more efficient processing of multiple transactions.
+
+
+REFERENCE COUNTING
+==================
+
+The Pristine Store spec above defines how texts are added and removed
+from the store.  This spec defines how the addition and removal of
+pristine text references within the WC DB are co-ordinated with the
+addition and removal of the pristine texts themselves.
+
+One requirement is to allow a pristine text to be stored some
+time before the reference to it is written into the NODES table.  The
+'commit' code path, for example, needs to store a file's new pristine
+text somewhere (and the pristine store is an obvious option) and then,
+when the commit succeeds, update the WC to reference it.
+
+Store-then-reference could be achieved by:
+
+  (a) Store text outside Pristine Store.  When commit succeeds, add it
+      to the Pristine Store and reference it in the WC; if commit
+      fails, remove the temporary text.
+  (b) Store text in Pristine Store with initial ref count = 0.  When
+      commit succeeds, add the reference and update the ref count; if
+      commit fails, optionally try to purge this pristine text.
+  (c) Store text in Pristine Store with initial ref count = 1.  When
+      commit succeeds, add the reference; if commit fails, decrement
+      the ref count and optionally try to purge it.
+
+Method (a) would require, in effect, implementing an ad-hoc temporary
+Pristine Store, which seems needless duplication of effort.  It would
+also require changing the way the commit code path passes information
+around, which might be no bad thing in the long term, but the result
+would not appear to have any advantage over method (b).
+
+Method (b) plays well with automatically maintaining the ref counts
+equal to the number of in-SDB references, at the granularity of SDB
+txns.  It requires an interlock between adding/deleting references and
+purging unreferenced pristines - e.g. guard each of these operations by
+a WC lock.
+  * Add a pristine & reference it => any WC lock
+    (To prevent purging it while adding.)
+  * Unreference a pristine => no lock needed.
+  * Unreference a pristine & purge-if-0 => Same as doing these separately.
+  * Purge any/all refcount==0 pristines => an exclusive WC lock.
+    (To prevent adding a ref while purging.)
+  * If a WC lock remains after a crash, then purge refcount==0 pristines.
+
+Method (c):
+  * ### Not sure about this one - haven't thought it through in detail...
+  * Add a pristine & reference in separate steps => any WC lock (?)
+  * Remove a reference requires ... (nothing more?)
+  * Find & purge unreferenced pristines requires an exclusive WC lock.
+  * Ref counts are sometimes too high while a WC lock is held, so
+    uncertain after a crash if WC locks remain, so need to be re-counted
+    during clean-up.
+
+We choose method (b).
+
+
+=== Invariants in a Valid WC DB State ===
+
+* No pristine text, even if refcount == 0, will be deleted from the store
+  as long as any process holds any WC lock in this WC.
+
+The following conditions are always true outside of a SQL txn:
+
+  * The 'checksum' column in each NODES table row is either NULL or
+    references a primary key in the 'pristine' table.
+
+  * The 'refcount' column in each PRISTINE table row is equal to the
+    number of NODES table rows whose 'checksum' column references this
+    pristine row.
+
+The following conditions are always true
+    outside of a SQL txn,
+    when the Work Queue is empty:
+    (### ?) when no WC locks are held by any process:
+
+  * The 'refcount' column in a PRISTINE table row equals the number of
+    NODES table rows whose 'checksum' column references that pristine row.
+    It may be zero.
+
+==== Operating Procedures ====
+
+The steps should be carried out in the order specified.
+
+* To add a pristine text reference to the WC, obtain the text and its
+  checksum, and then do this while holding a WC lock:
+    * Add the pristine text to the Pristine Store, setting the desired
+      refcount >= 1.
+    * Add the reference(s) in the NODES table.
+
+* To remove a pristine text reference from the WC, do this while holding
+  a WC lock:
+    * Remove the reference(s) in the NODES table.
+    * Decrement the pristine text's 'refcount' column.
+
+* To purge an unreferenced pristine text, do this with an *exclusive*
+  WC lock:
+    * Check refcount == 0; skip if not.
+    * Remove it from the pristine store.


Reply via email to