On 01/20/2013 09:17 PM, Chris Rorvick wrote:
> I probably won't be sending any more patches on this.  My hope was to
> get cvsimport-3 (w/ cvsps as the engine) in a state such that one
> could transition from the previous version seamlessly.  But the break
> in t9605 has convinced me this is not worth the effort--even in this
> trivial case cvsps is broken.  The fuzzing logic aggregates commits
> into patch sets that have timestamps within a specified window and
> otherwise matching attributes.  This aggregation causes file-level
> commit timestamps to be lost and we are left with a single timestamp
> for the patch set: the minimum for all contained CVS commits.  When
> all commits have been processed, the patch sets are ordered
> chronologically and printed.
> The problem is that is that a CVS commit is rolled into a patch set
> regardless of whether the patch set's timestamp falls within the
> adjacent CVS file-level commits.  Even worse, since the patch set
> timestamp changes as subsequent commits are added (i.e., it's always
> picking the earliest) it is potentially indeterminate at the time a
> commit is added.  The result is that file revisions can be reordered
> in resulting Git import (see t9605.)  I spent some time last week
> trying to solve this but I coudln't think of anything that wasn't a
> substantial re-work of the code.
> I have never used cvs2git, but I suspect Eric's efforts in making it a
> potential backend for cvsimport are a better use of time.

Thanks for your explanation of how cvsps works.

This is roughly how cvs2svn used to work years ago, prior to release
2.x.  In addition it did a number of things to try to tweak the
timestamp ordering to avoid committing file-level commits in the wrong
order.  It never worked 100%; each tweak that was made to fix one
problem created another problem in another scenario.

cvs2svn/cvs2git 2.x takes a very different approach.  It uses a
timestamp threshold along with author and commit-message matching to
find the biggest set of file-level commits that might constitute a
repository-level commit.  But then it checks the proto-commits to see if
they violate the ordering constraints imposed by the individual
file-level commits.  For example, if the initial grouping gives the
following proto-commits:

proto-commit 1: a.txt 1.1        b.txt 1.2

proto-commit 2: a.txt 1.2        b.txt 1.1

then it is apparent that something is wrong, because a.txt 1.1
necessarily comes before a.txt 1.2 whereas b.txt 1.1 necessarily comes
before b.txt 1.2 (CVS can at least be relied on to get this right!) and
therefore there is no consistent ordering of the two proto-commits.
More generally, the proto-commits have to form a directed acyclic graph,
whereas this graph has a cycle 1 -> 2 -> 1.  When cvs2svn/cvs2git finds
a cycle, it uses heuristics to break up one or more of the proto-commits
to break the cycle.  In this case it might break proto-commit 1 into two

proto-commit 1a: a.txt 1.1

proto-commit 2:  a.txt 1.2        b.txt 1.1

proto-commit 1b:                  b.txt 1.2

Now it is possible to commit them in the order 1a,2,1b.  (Exactly this
scenario is tested in t9603.)

Of course a typical proto-commit graph often contains far more
complicated cycles, but the approach remains the same: split
proto-commits up as necessary until the graph is acyclic.  One can
quibble about the heuristics that cvs2svn/cvs2git uses to break up
proto-commits.  But the final result of the algorithm is *guaranteed* to
be consistent with the file-level CVS history and also self-consistent.

I am skeptical that a simpler approach will ever work 100%.


Michael Haggerty
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to