On 12/19/2013 02:11 AM, Johan Herland wrote: > On Thu, Dec 19, 2013 at 12:44 AM, Michael Haggerty <mhag...@alum.mit.edu> > wrote: >> A correct incremental converter could be done (as long as the CVS users >> don't literally change history retroactively) but it would be a lot of work. > > Although I agree with that sentence as it is stated, I also believe > that the parenthesized condition rules out a _majority_ of CVS repo of > non-trivial size/history. So even though a correct incremental > converter could be built, it would be pretty much useless if it did > not gracefully handle rewritten history. And in the face of rewritten > history it becomes pretty much impossible to define what a "correct" > conversion should even look like (not to mention the difficulty of > actually implementing that converter...).
A correct conversion would, conceptually, take a diff between the old CVS history and the new CVS history (I'm talking about the history as a whole, not a diff between two changesets), figure out what had changed, and then figure out what Git commits to make to effect the same conceptual changes in Git-land. This means that the final Git history would have to depend not only on the current entirety of the CVS history, but also on what the CVS history *was* during previous incremental imports and how the tool chose to represent that history in Git the previous rounds. There is a tradeoff here. The smarter the tool is, the fewer restrictions would have to be made on what people can do in CVS. For example, it wouldn't be unreasonable to impose a rule that people are not allowed to move files within the CVS repository (e.g., to fake move-file-with-history) after the CVS <-> Git bridge is in use. (Abuses of the history that occurred *before* the first incremental conversion, on the other hand, wouldn't be a problem.) If the user of the incremental tool has *no* influence on how his colleagues use CVS, then the tool would have to be very smart and/or the user would might sometimes be forced to do another from-scratch conversion. > Here are just a couple of things a CVS user can do (and that happened > fairly regularly at my previous $dayjob) that would make life > difficult for an incremental converter (and that also makes stable > output from a non-incremental converter hard to solve in practice): > > - A user "deletes" $file from $branch by simply removing the $branch > symbol on $file (cvs tag -B -d $branch $file). CVS stores no record of > this. Many non-incremental importers will see $file as never having > existed on $branch. An incremental importer starting from a previously > converted state, must somehow deal with that previous state no longer > existing from the POV of CVS. No problem; the tool could just add a synthetic commit "git rm"ming the file from the branch. It wouldn't know *when* the file was deleted, so it would have to pick a plausible date between the time of the last incremental conversion and the one that discovers that the branch tag has been removed from the file. The resulting Git history would contain more complete information than CVS's history. > - A user moves a release tag on a few files to include a late bugfix > into an upcoming release (cvs tag -F -r $new_rev $tag $file). There > might be no single point in time where the tagged state existed in the > repo, it has become a "Frankentag". You could claim user error here, > and that such shortcuts should not happen, but that doesn't really > prevent it from ever happening. Recreating the tree state of the > Frankentag in Git is easy, but what kind of history do you construct > to lead up to that tree? Frankentags (tags that include file versions that didn't occur contemporaneously) can occur even with one-time CVS->Git conversions. The only way to handle them is to create a Git branch representing the tag and base it at a plausible Git commit, and then (on the branch) issue a fixup commit that makes the contents of the branch equal to the contents of the CVS branch. This is a problem that cvs2git already handles. A hypothetical incremental importer would have to notice the changes in the branch contents between the previous conversion and the current one, and create commits on the branch to bring it in line with the current contents. This is no uglier than what a one-shot conversion already has to do. > - A modularized project develops code on HEAD, and make regular > releases of each module by tagging the files in the module dir with > "$modulename-$version". Afterwards a project-wide "stable" tag is > moved on that subset of files to include the new module release into > the "stable" tag. ("stable" is conceptually a branch, but the CVS > mechanism used here is still the tag, since CVS branches cannot > "follow" eachother like in Git). This is pretty much the same > Frankentag scenario as above, except that in this case it might be > considered Best Practice (it was at our $dayjob), and not a > shortcut/user error made by a single user. Same problem and same solution as above, as far as I can see. > (None of these examples even involve the "cvs admin" which allows you > to do some truly scary and demented things to your CVS history...) Even some of these might be permitted. For example: * Obsoleting already-converted revisions: it's a pretty stupid thing to do in most cases and the tool could just ignore such events, retaining the history in Git. If the revisions were obsoleted because they contained proprietary information or something, then you've got a bigger problem on your hands but one that you would have even if you were using pure Git. * Retroactive changes to log messages: would probably have to be ignored or handled via notes. * Changes to the "default branch" (another brain-dead CVS feature related to vendor branches): I'd have to think about it. But handling vendor branches is already difficult for a one-time converter because CVS retains too little info (but cvs2git does it except in the most ambiguous cases). An incremental importer would have *more* information than a one-shot importer, because it would have a hope of catching the change to the default branch at roughly the time it occurred. > My point here is that people will use whatever available tools they > have to solve whatever problems they are currently having. And when > CVS is your tool, you will sooner or later end up with a "solution" > that irrevocably rewrites your CVS history. Yes, but I maintain that an incremental importer could keep a Git history that is consistent with the CVS history in the sense that: 1. the result of checking out any branch or tag, right after a run of the importer, gives the same results as checking the same branch or tag out of CVS. 2. the Git history from one run is added to (never rewritten) by the next run. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html