On 1/25/2011 4:07 PM, Jamon Camisso wrote:
For anyone who is interested, here is a list of all relevant directories
from SVN, including those that were deleted at some point in the past.
The plan is to map what Colin has outlined below to directories in this
file, and then to convert each to a tag, branch, or master branch
depending on where it needs to live in Git.

Responding to myself here, and would like to hear from people about the following:

Justin and I have been working on importing SVN into Git this week, with a fair amount of success. We managed to cut infusion down to about 22-24mb by removing extraneous psd files from the repository.

However, in shuffling repositories and branches around, we have discovered that the tool being used svn-all-fast-export[1][2] does not incorporate SVN commits to empty directories into the git repository. This behaviour is by design - both Git and Mercurial explicitly do not support tracking directories.

This feature (or bug depending on which side of the fence is most attractive or comfortable) means that where historical changes to SVN like the move from /utoronto/fluid to /fluid occurred, the particular commit tracking that change is not present in Git.

One of goals during this migration to Git is to preserve as much history in the various repositories that are being forked as possible. This attempt at maintaining the historical integrity of Fluid's source code repositories will ensure that future members or external participants in the Fluid community will have access to relevant information about the historical development of various projects.

With all that in mind, Justin and I can think of a few options that are or will be more or less palatable to those who have read this far:

Option 1) Stick with SVN. Unlikely. This choice would not be in keeping with the distributed collaborative nature of Fluid. As such it would be a very unsavory outcome.

Option 2) Use svn-all-fast-export as it currently runs, with the proviso that any SVN commit of an empty directory or directories will be elided from the history of the repository. This option is semi-palatable in that the final repositories would look and behave exactly as if they were created in Git in the first place.

Option 3) Convert repositories using svn-all-fast-export and run "git commit --append" on each commit in question. Said commits can be found using the output of the svn-all-fast-export tool with full rule debugging output enabled and piped to a log file or extracted directly using grep:

grep -E "Exporting revision ([0-9]{4,5})?{4,5}(.*)nothing to do" import.log

That output (of 4286 commits) could then be matched to specific commits that solely affected A/D changes to directories in SVN. For example, r4124-4126 is one such series of commits.

Whereas each Git commit would initially look like the following:

commit ec2571d0833cbd72fa42d471ba2acdbe9ece71dd
Author: Joseph Scheuhammer <[email protected]>
Date:   Fri May 18 15:56:36 2007 +0000

    Initial Fluid branch of Berkeley's Gallery Tool

    svn path=/utoronto/fluid/gallery/; revision=4126

The affected commits can then be edited to look like this:

    svn path=/utoronto/fluid/gallery/; revision=4124,4125,4126
    Extra comment here pointing to Wiki, or SVN, or a file in Git
    outlining changes to the repository

Option 4) Hack on svn-all-fast-export to make it do something with directory modifications. This option would likely take a fair amount of time and work to get it working just right, and is not in keeping with the fundamental design of Git.

Option 5) Use a different tool altogether, like git-svn, or the original svn2git tool. These tools are not nearly as sophisticated as svn-all-fast-export in that they are a) incredibly slow and b) unable to track changes to a file's location between directories historically deleted directories the same way that svn-all-fast-export does.

My first preference would be Option 3. However, successfully mapping commits of empty directories to preceding commits depends on how much information can be extracted and correlated programmatically. If there is too much manual work required then my other preference would be Option 2.

Option 2 is viable and would be the fastest of the two. This optiont akes into account the fact that SVN will still be online. I would imagine that anyone who is interested enough in who created an empty directory would probably be willing to do the work of quickly doing and svn log -r0001 on the repository and extracting the information that way.

The fact that not all information is being imported from SVN to Git (Photoshop psd files for example) makes option 2 that much more compelling in that it would take very little time to freeze SVN and just do the conversion.

In the end options 2 and 3 both preserve information about empty directories, albeit in two different locations. Whereas the former retains an intact record in SVN, the latter entails taking small liberties with the historical record in Git. However, in both cases, the fact that committer X created directory Y will still be easily gleaned from some easily found and well documented location for those who are interested in such information.

tl;dr there is no easy way to import empty directories into Git. Option 2 is less disruptive and faster, while leaving information in multiple locations. Option 3 will require some small amount of historical revisionism, while retaining what history and files are deemed important in one repository format.

Feedback is welcome at this point. I imagine Colin and Antranig will be especially interested in sharing their thoughts.

Regards, Jamon

[1] http://packages.debian.org/testing/main/svn-all-fast-export
[2] svn-all-fast-export has been forked and named svn2git, the confusing part being that there is a Ruby project that precedes the fork with the same name..)
_______________________________________________________
fluid-work mailing list - [email protected]
To unsubscribe, change settings or access archives,
see http://fluidproject.org/mailman/listinfo/fluid-work

Reply via email to