Julian Foad wrote on 2018-04-16:
Next steps for shelve & checkpoint.
* change the storage so we can shelve (large) binary files
* API abstraction to access shelf data
* store and retrieve base revisions
* more complete testing -- we should have a way of testing all possible
kinds of change
=== Storage for binary files ===
I made the shelve function walk the WC itself (r1829291), so we can
intercept binary files at that point and do something other than diff
with them.
From r1829295, for binary files it uses git diff binary literal format.
This works for a stop-gap, but is inefficient for large files.
Soon I will change it at that interception point, like this. It will
store a 'binary' file by copying the working version into a directory
structure that parallels the WC directory structure, inside
'.svn/shelves/<shelf-name-encoded>-<version>.d/', instead of storing a
(git binary literal) diff in the patch file.
The file's properties can continue to go into the patch file as a
property-diff, for the time being.
That should be fast enough for use with very large files.
(And what does being classed as a 'binary' file mean, semantically? It
means when a modified binary file is shelved and later re-applied to the
WC, the modifications will not be merged, not even in the 'patch' sense,
but instead the file will be copied as a whole, in the same way that
'svn update' and 'svn merge' handle a binary file.)
I have implemented the basics of this. I haven't finished the part the
detects a conflict when unshelving. When that's done I'll commit.
Ideally shelves will be able to share the WC pristine store for storing
whole file contents. [...]
=== API abstraction ===
We need libsvn_client APIs to be able to access shelves in the same way
as "regular" WC data: export|diff|cat|propget|... for data stored in any
shelf. The result of any such API operating on a shelf should be
analogous to how the same function would operate on the WC if we first
unshelved the change.
Why do we need generic APIs to support these kinds of functions?, we
might ask. It's not because the user necessarily needs all these
operations, but to make programming sane. It should be possible to write
a conceptually simple high-level operation such as "copy all the changes
found in this WC subtree to this shelf" by setting up the source and
destination objects and then invoking a common "copy a tree of changes"
routine, not by writing a new deep implementation of all the guts
specifically for this source-and-destination pair.
I have started working on such APIs in the 'tree-api' branch (recently
resurrected from my years-old 'tree-read-api' branch).
A possible starting point, currently implemented on the
'shelve-checkpoint' branch, is to modify svn_opt_revision_t and the
revision-number parsing to accept a shelf name as another kind of
revision specifier. This (and the other revprop functions) works so far:
$ svn propget -r foo --revprop svn:log
This is the log message of shelf 'foo'.
=== Store and retrieve base revisions ===
Storing the revision number metadata is easy. Svn diff format has always
written the base revision of each file in the diff header. The recent
'svn info --viewspec' prototype now provides a way to write a complete
description of the revisions and 'shape' (depth and switch settings) of
a WC.
Reading it out is a SMOP. Doing something with it -- that is, doing a
3-way-merge instead of a 'patch' operation -- is conceptually a SMOP but
probably more involved.
Snapshotting the actual content of the base is much more involved if we
intend to keep this snapshot attached to each the shelf even though the
user runs 'update'. In order to decide whether it is important to do so,
I suggest we implement making use of just the revision number metadata
and test its performance -- accepting that either repository access or
fallback to plain patching would be needed in cases where 'update' has
been done.
Glad to hear any thoughts.
- Julian