Julian Foad wrote on 2018-04-16:
Next steps for shelve & checkpoint.

* change the storage so we can shelve (large) binary files

* API abstraction to access shelf data

* store and retrieve base revisions

* more complete testing -- we should have a way of testing all possible kinds of change


=== Storage for binary files ===

I made the shelve function walk the WC itself (r1829291), so we can intercept binary files at that point and do something other than diff with them.

From r1829295, for binary files it uses git diff binary literal format. This works for a stop-gap, but is inefficient for large files.

Soon I will change it at that interception point, like this. It will store a 'binary' file by copying the working version into a directory structure that parallels the WC directory structure, inside '.svn/shelves/<shelf-name-encoded>-<version>.d/', instead of storing a (git binary literal) diff in the patch file.

The file's properties can continue to go into the patch file as a property-diff, for the time being.

That should be fast enough for use with very large files.

(And what does being classed as a 'binary' file mean, semantically? It means when a modified binary file is shelved and later re-applied to the WC, the modifications will not be merged, not even in the 'patch' sense, but instead the file will be copied as a whole, in the same way that 'svn update' and 'svn merge' handle a binary file.)

I have implemented the basics of this. I haven't finished the part the detects a conflict when unshelving. When that's done I'll commit.


Ideally shelves will be able to share the WC pristine store for storing whole file contents. [...]


=== API abstraction ===

We need libsvn_client APIs to be able to access shelves in the same way as "regular" WC data: export|diff|cat|propget|... for data stored in any shelf. The result of any such API operating on a shelf should be analogous to how the same function would operate on the WC if we first unshelved the change.

Why do we need generic APIs to support these kinds of functions?, we might ask. It's not because the user necessarily needs all these operations, but to make programming sane. It should be possible to write a conceptually simple high-level operation such as "copy all the changes found in this WC subtree to this shelf" by setting up the source and destination objects and then invoking a common "copy a tree of changes" routine, not by writing a new deep implementation of all the guts specifically for this source-and-destination pair.

I have started working on such APIs in the 'tree-api' branch (recently resurrected from my years-old 'tree-read-api' branch).

A possible starting point, currently implemented on the 'shelve-checkpoint' branch, is to modify svn_opt_revision_t and the revision-number parsing to accept a shelf name as another kind of revision specifier. This (and the other revprop functions) works so far:

   $ svn propget -r foo --revprop svn:log
   This is the log message of shelf 'foo'.


=== Store and retrieve base revisions ===

Storing the revision number metadata is easy. Svn diff format has always written the base revision of each file in the diff header. The recent 'svn info --viewspec' prototype now provides a way to write a complete description of the revisions and 'shape' (depth and switch settings) of a WC.

Reading it out is a SMOP. Doing something with it -- that is, doing a 3-way-merge instead of a 'patch' operation -- is conceptually a SMOP but probably more involved.

Snapshotting the actual content of the base is much more involved if we intend to keep this snapshot attached to each the shelf even though the user runs 'update'. In order to decide whether it is important to do so, I suggest we implement making use of just the revision number metadata and test its performance -- accepting that either repository access or fallback to plain patching would be needed in cases where 'update' has been done.


Glad to hear any thoughts.

- Julian

Reply via email to