Distributed RCS (was: Re: what constitutes integrity?)

Klaus T. Aehlig Sat, 21 Jul 2012 05:51:04 -0700

> So, what do people think should go into CACHED-HASH-INFO?

In my oppinion, we should use it only once we really need it. It is an
escape, should the necessity to store more global data arise -- not an
extension to be used for no good reason.


Concerning distributed use of rcs, I think we can go without.

My vision would be that everyone works locally and, upon receiving an
rcs file just "attaches" the missing commits to the local rcs file.
"Attaching" in the sense to come up with a file that looks as if the
other party just locally checked out the parent version, modified and
committed (but setting commit time and author as in the received file),
i.e., commits extending a head of a branch just continue that branch
and commits extending a version that is not head just start a new branch.
Merging in the branches then would be done with rcsmerge, as usual.

So the task is to recognise a version in the local rcs file, even though
it has a different version number in the received file (and of course
recognising to versions as different even though the version numbers
coincide). To do so, for every revision compute a unique identifier
that identifies the "semantics" of a version. My suggestion is to take
a hash of (a string from which you can reconstruct)

- the content
- the log message
- the identifier of the parent commit

Then two version would be considered equal, if the semantic identifier
coincides, and all new commits would be attached as new child of the
parent commit (i.e., the commit in the old file that has the same
identifier). Traversing from the initial revision onwards ensures that
we always attach the parent first.

To get a feeling for the semantics, you can use my old prototype
script[1], which I'm still actively using today (despite having quadratic
complexity, which is a bit anoying). However, I would suggest a different
implementation based directly on the rcs functions. Essentially, we
would have to traverse every version, starting from the initial one,
to compute the recursive hashes. Unfortunately, the diffs are orientied
the other way on the part HEAD ... 1.1. My suggestion therefore would be to
traverse that path as you would do to co -r1.1, but additionally compute
the reverse diffs. Then all versions can be traversed, and the hashes
computed, without having more than one full version of the file in
memory at a time. The effort would be linear in the number of revisions
in the file, as is the effort for checking out the initial version in
a typical case. Therefore my suggestion is to implement attaching without
using the extension. This keeps as flexible for the future.

Klaus

[1] http://www.linta.de/~aehlig/university/rcsjoin.py

Distributed RCS (was: Re: what constitutes integrity?)

Reply via email to