Re: renaming under CVS

Paul Sander Mon, 04 Mar 2002 11:43:50 -0800

>--- Forwarded mail from [EMAIL PROTECTED]

>--- Paul Sander <[EMAIL PROTECTED]> wrote:
>> Unfortunately, modules can overlap, which means that
>> the new
>> locking mechanism must have a granularity smaller
>> than the
>> module.
>> 
>> Depending on how the mapping is done, especially if
>> there's an
>> attempt to preserve backward compatibility (and
>> preserving CVS'
>> existing directory-based locking in the repository)
>> then you
>> end up locking every directory that contains a ,v
>> file that maps
>> to a file in the user's sandbox, every time an
>> update or commit
>> is done.


>Some more thoughts...

>Each file and directory are mapped to a ,v archive
>file.  The contents of the directory archive files are
>the mappings of its elements and the types (eg file or
>directory) of those elements.  The basenames of the
>archive files will be hex representations of random
>256-bit numbers generated with a "secure" version of
>the Mersenne Twister algorithm.

Why not just sequentially number the containers?  Or use a timestamp
plus random element to name them?

>Locking will occur on a per-repository basis. 
>Permissions can still be done on a per-directory
>basis.

Permissions on a directory basis are tough if files are linked to
multiple directories that have different permissions.

>Since the repository structure will no longer be
>directory-based, module definitions like "module
>path/to/module" won't be supported.

Correct, module definitions become implicit in the directory mappings.
However, there's still a gotcha at the top level.  The things you give
as arguments to the "cvs checkout" command need to be treated specially
in some way so that they can be located correctly.  Limiting operations
to adds and renames (without replacement if the target already exists)
is a start when considering this.

Also, I was considering using a special container name of "0" to locate
the top-level definitions.

>I think a transition from an old repository to the
>above shouldn't be too bad assuming people don't have
>complicated module definitions.  For those with a
>complicated module definitions, a switch could be
>provided to use the old style (the default would be to
>support backward compatibility).  A tool can also be
>provided to convert the old repo into a new repo.

I think that mapping the modules database to the new structure is the
easier of two problems faced when converting.  The other is mapping
the existing directory-based mapping to the new one, considering dead
and resurected revisions and so on.

>Old clients will still work on new servers but since
>the mappings will be done by the server, they'll be
>slower than new clients.  New clients will store the
>mappings within the CVS directories.  This implies
>that the CS protocol will need to be extended in such
>a way that a new server will recognize a new client. 
>If the client can query the server for its version,
>new clients can also work with old servers.

I don't really know enough about the protocol to comment on this, but I
suspect that the current mapping is somehow implicit in its implementation.
I would assume that the client/server protocol would need to be redesigned
as well, thus making current clients incompatible with new servers.

>The command "cvs mv" will be added.  Upon checkin, a
>mv command will checkin a new version of the archive
>file(s) of the affected directory(ies).

A "cvs ln" is needed as well, to copy CVS meta-data from one project
to another for when artifacts become shared.  A variant might also be
needed that accepts container names and creates the proper mapping for
the sandbox.

I've been considering a few issues with regard to a new implementation.
First, it's not necessary to lock the RCS files at all for read-only
operations if version numbers are known beforehand, or some other means
of identification is available (e.g. branch/timestamp pair).  It might be
possible to implement a lock-free mechanism to control access to the
repository.

That said, I've come up with a per-file based locking mechanism that might
work (but it's inefficient because it's filesystem based).  It involves
creating a hard link to an RCS file when we want to commit a change, use RCS
on the link to record changes to the container, then rename the updated RCS
file back to the original place as the commit completes.

This is essentially a two-phase commit implementation, which has the
potential to make commits truly atomic.  (This actually applies to all
changes to RCS files, including tagging!)  What's missing is a transaction
log that records all of the affected RCS files and crash recovery tool that
either removes or renames the linked RCS files depending on how far down the
commit path someone got at the time of the crash.  But that's easy to
implement as well.

The down side is that for each file affected, there's a requirement of
three times the size of the RCS file per file, plus double the aggregate
of all of the RCS files updated.

Another annoyance is that things like "cvs log" that operate on sets of
revisions may produce unwanted results, particularly if they run concurrently
with transactions that later abort.  But I think that this problem can be
solved as well.

I dug up Dick Grune's third release of his original CVS implementation and
will try some experiments as time permits.  It's a bit of a shock going back
to one's roots like that, and there's not a lot of similarity between it and
what we now know as CVS.  But it's a good place to start tinkering.

>--- End of forwarded message from [EMAIL PROTECTED]


_______________________________________________
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs

Re: renaming under CVS

Reply via email to