> On Jul 16, 2019, at 3:34 PM, Jason Merrill <ja...@redhat.com> wrote:
> 
> On Tue, Jul 16, 2019 at 12:18 PM Maxim Kuvyrkov
> <maxim.kuvyr...@linaro.org> wrote:
>> 
>> Hi Everyone,
>> 
>> I've been swamped with other projects for most of June, which gave me time 
>> to digest all the feedback I've got on GCC's conversion from SVN to Git.
>> 
>> The scripts have heavily evolved from the initial version posted here.  They 
>> have become fairly generic in that they have no implied knowledge about 
>> GCC's repo structure.  Due to this I no longer plan to merge them into GCC 
>> tree, but rather publish as a separate project on github.  For now, you can 
>> track the current [hairy] version at 
>> https://review.linaro.org/c/toolchain/gcc/+/31416 .
>> 
>> The initial version of scripts used heuristics to construct branch tree, 
>> which turned out to be error-prone.  The current version parse entire 
>> history of SVN repo to detect all trees that start at /trunk@1.  Therefore 
>> all branches in the converted repo converge to the same parent at the 
>> beginning of their histories.
>> 
>> As far as GCC conversion goes, below is what I plan to do and what not to 
>> do.  This is based on comments from everyone in this thread:
>> 
>> 1. Construct GCC's git repo from SVN using same settings as current git 
>> mirror.
>> 2. Compare the resulting git repo with current GCC mirror -- they should 
>> match on the commit hash level for trunk, branches/gcc-*-branch, and other 
>> "normal" branches.
>> 3. Investigate any differences between converted GCC repo and current GCC 
>> mirror.  These can be due to bugs in git-svn or other misconfigurations.
>> 4. Import git-only branches from current GCC mirror.
>> 5. Publish this "raw" repo for community to sanity-check its contents.
> 
> Why not start from the current mirror?  Perhaps a mirror of the mirror?

To check that git-svn is self-consistent and generates same commits now as it 
was several years ago when you setup the current mirror.  

> 
>> 6. Re-write history of all branches -- converted from svn and git-only -- 
>> see note below [*].
>> 7. Publish this "pretty" repo for community to sanity-check its contents.
>> 8. Update both "raw" and "pretty" repos daily with new commits
>> 9. Fix problems in the "raw" and "pretty" repos as they reported by the 
>> community.
>> 
>> Once these steps are done, the community could switch from SVN to git by 
>> disabling commits to SVN, waiting for final history to be absorbed by the 
>> "pretty" repo, and deploying the git repo as the official repo.
>> 
>> [*] Note on branch re-writing:
>> During svn->git conversion we have an opportunity to correct some of the 
>> artifacts of current git mirror:
>> 
>> a. Author and committer entries.  These are difficult to get right during 
>> git-svn import process because the tool gives only SVN committer ID without 
>> much else.  We could do much better by matching SVN committer ID with 
>> person's name in the map file, and then searching for person's 
>> current-at-the-time email address in the commit diff.  I.e., mkuvyrkov -> 
>> Maxim Kuvyrkov -> [changelog from 2010's commit] -> ma...@codesourcery.com .
> 
>> c. Since we are re-writing history anyway, it would be nice to convert 
>> "svn-git: svn+ssh://" tags to "svn-git: https://";.  We are sure to retain 
>> publicly-visible svn repo accessible via https://, but not as likely to 
>> retain svn+ssh:// interface.
> 
> I am moderately opposed to rewriting trunk and release branch history;
> if we're using git-svn anyway, the benefit would have to be large to
> outweigh the significant inconvenience to all current users of needing
> to switch their local trees over to a new history.

I mostly agree with your point.  My thinking is that the git mirror was never 
official canonical GCC repo, and if we ever want to get better author/committer 
identities -- this is our chance.

> 
>> b. Re-write tags/ branches into annotated tags.  Note that tags/* are 
>> included into history of several branches via merge or copy commits, so we 
>> would need to re-write history to have proper references to annotated tag 
>> commits in the histories of such branches.
> 
> Missing tags is definitely something to fix about the current mirror.
> I don't think we need to worry about inserting them into branch
> history.

If we don't do this then "git branch -a --contains some/tag" will not work 
correctly.

> 
> We should definitely also rewrite vendor/subdirectory branches into
> multiple branches.

Vendor and subdirectory branches are properly handled by the scripts.  I wonder 
whether re-writing them using tree-filters would produce same result as git-svn 
conversions I'm doing.

--
Maxim Kuvyrkov
www.linaro.org


> 
>> Which of these will make into the final repo is for community to decide.
>> 
>> Regards,
>> 
>> --
>> Maxim Kuvyrkov
>> www.linaro.org
>> 
>> 
>> 
>>> On May 28, 2019, at 1:31 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
>>> wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> What can I say, I was too optimistic about how easy it would be to convert 
>>> GCC's svn repo to git one branch at a time.  After 2 more weeks and several 
>>> re-writes of the scripts I now know more about GCC's svn history than I 
>>> would ever wanted.
>>> 
>>> The prize for most complicated branch history goes to /branches/ibm/* .  It 
>>> has merges, it has re-creation branches from /trunk and even an accidental 
>>> deletion of all of IBM's branches.
>>> 
>>> The version of scripts I'm testing right now seems to deal with all of that.
>>> 
>>> Also, to avoid controversy -- I'm working on these scripts to satisfy my 
>>> own curiosity, and to give GCC community another option to choose from for 
>>> the final migration.  If by end of Summer 2019 we have 2-3 git repos to 
>>> choose from, then we are likely to push GCC [kicking and screaming] into 
>>> 2010's by the end of this decade.
>>> 
>>> --
>>> Maxim Kuvyrkov
>>> www.linaro.org
>>> 
>>> 
>>> 
>>>> On May 14, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
>>>> wrote:
>>>> 
>>>> This patch adds scripts to contrib/ to migrate full history of GCC's 
>>>> subversion repository to git.  My hope is that these scripts will finally 
>>>> allow GCC project to migrate to Git.
>>>> 
>>>> The result of the conversion is at 
>>>> https://github.com/maxim-kuvyrkov/gcc/branches/all . Branches with "@rev" 
>>>> suffixes represent branch points.  The conversion is still running, so not 
>>>> all branches may appear right away.
>>>> 
>>>> The scripts are not specific to GCC repo and are usable for other 
>>>> projects.  In particular, they should be able to convert downstream GCC 
>>>> svn repos.
>>>> 
>>>> The scripts convert svn history branch by branch.  They rely on git-svn on 
>>>> convert individual branches.  Git-svn is a good tool for converting 
>>>> individual branches.  It is, however, either very slow at converting the 
>>>> entire GCC repo, or goes into infinite loop.
>>>> 
>>>> There are 3 scripts:
>>>> 
>>>> - svn-git-repo.sh: top level script to convert entire repo or a part of it 
>>>> (e.g., branches/),
>>>> - svn-list-branches.sh: helper script to output branches and their parents 
>>>> in bottom-up order,
>>>> - svn-git-branch.sh: helper script to convert a single branch.
>>>> 
>>>> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
>>>> 
>>>> What are your questions and comments?
>>>> 
>>>> The attached is cleaned up version, which hasn't been fully tested yet; 
>>>> typos and other silly mistakes are likely.  OK to commit after testing?
>>>> 
>>>> --
>>>> Maxim Kuvyrkov
>>>> www.linaro.org
>>>> 
>>>> 
>>>> <0001-Contrib-SVN-Git-conversion-scripts.patch>

Reply via email to