> On Jul 16, 2019, at 3:34 PM, Jason Merrill <ja...@redhat.com> wrote: > > On Tue, Jul 16, 2019 at 12:18 PM Maxim Kuvyrkov > <maxim.kuvyr...@linaro.org> wrote: >> >> Hi Everyone, >> >> I've been swamped with other projects for most of June, which gave me time >> to digest all the feedback I've got on GCC's conversion from SVN to Git. >> >> The scripts have heavily evolved from the initial version posted here. They >> have become fairly generic in that they have no implied knowledge about >> GCC's repo structure. Due to this I no longer plan to merge them into GCC >> tree, but rather publish as a separate project on github. For now, you can >> track the current [hairy] version at >> https://review.linaro.org/c/toolchain/gcc/+/31416 . >> >> The initial version of scripts used heuristics to construct branch tree, >> which turned out to be error-prone. The current version parse entire >> history of SVN repo to detect all trees that start at /trunk@1. Therefore >> all branches in the converted repo converge to the same parent at the >> beginning of their histories. >> >> As far as GCC conversion goes, below is what I plan to do and what not to >> do. This is based on comments from everyone in this thread: >> >> 1. Construct GCC's git repo from SVN using same settings as current git >> mirror. >> 2. Compare the resulting git repo with current GCC mirror -- they should >> match on the commit hash level for trunk, branches/gcc-*-branch, and other >> "normal" branches. >> 3. Investigate any differences between converted GCC repo and current GCC >> mirror. These can be due to bugs in git-svn or other misconfigurations. >> 4. Import git-only branches from current GCC mirror. >> 5. Publish this "raw" repo for community to sanity-check its contents. > > Why not start from the current mirror? Perhaps a mirror of the mirror?
To check that git-svn is self-consistent and generates same commits now as it was several years ago when you setup the current mirror. > >> 6. Re-write history of all branches -- converted from svn and git-only -- >> see note below [*]. >> 7. Publish this "pretty" repo for community to sanity-check its contents. >> 8. Update both "raw" and "pretty" repos daily with new commits >> 9. Fix problems in the "raw" and "pretty" repos as they reported by the >> community. >> >> Once these steps are done, the community could switch from SVN to git by >> disabling commits to SVN, waiting for final history to be absorbed by the >> "pretty" repo, and deploying the git repo as the official repo. >> >> [*] Note on branch re-writing: >> During svn->git conversion we have an opportunity to correct some of the >> artifacts of current git mirror: >> >> a. Author and committer entries. These are difficult to get right during >> git-svn import process because the tool gives only SVN committer ID without >> much else. We could do much better by matching SVN committer ID with >> person's name in the map file, and then searching for person's >> current-at-the-time email address in the commit diff. I.e., mkuvyrkov -> >> Maxim Kuvyrkov -> [changelog from 2010's commit] -> ma...@codesourcery.com . > >> c. Since we are re-writing history anyway, it would be nice to convert >> "svn-git: svn+ssh://" tags to "svn-git: https://". We are sure to retain >> publicly-visible svn repo accessible via https://, but not as likely to >> retain svn+ssh:// interface. > > I am moderately opposed to rewriting trunk and release branch history; > if we're using git-svn anyway, the benefit would have to be large to > outweigh the significant inconvenience to all current users of needing > to switch their local trees over to a new history. I mostly agree with your point. My thinking is that the git mirror was never official canonical GCC repo, and if we ever want to get better author/committer identities -- this is our chance. > >> b. Re-write tags/ branches into annotated tags. Note that tags/* are >> included into history of several branches via merge or copy commits, so we >> would need to re-write history to have proper references to annotated tag >> commits in the histories of such branches. > > Missing tags is definitely something to fix about the current mirror. > I don't think we need to worry about inserting them into branch > history. If we don't do this then "git branch -a --contains some/tag" will not work correctly. > > We should definitely also rewrite vendor/subdirectory branches into > multiple branches. Vendor and subdirectory branches are properly handled by the scripts. I wonder whether re-writing them using tree-filters would produce same result as git-svn conversions I'm doing. -- Maxim Kuvyrkov www.linaro.org > >> Which of these will make into the final repo is for community to decide. >> >> Regards, >> >> -- >> Maxim Kuvyrkov >> www.linaro.org >> >> >> >>> On May 28, 2019, at 1:31 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> >>> wrote: >>> >>> Hi Everyone, >>> >>> What can I say, I was too optimistic about how easy it would be to convert >>> GCC's svn repo to git one branch at a time. After 2 more weeks and several >>> re-writes of the scripts I now know more about GCC's svn history than I >>> would ever wanted. >>> >>> The prize for most complicated branch history goes to /branches/ibm/* . It >>> has merges, it has re-creation branches from /trunk and even an accidental >>> deletion of all of IBM's branches. >>> >>> The version of scripts I'm testing right now seems to deal with all of that. >>> >>> Also, to avoid controversy -- I'm working on these scripts to satisfy my >>> own curiosity, and to give GCC community another option to choose from for >>> the final migration. If by end of Summer 2019 we have 2-3 git repos to >>> choose from, then we are likely to push GCC [kicking and screaming] into >>> 2010's by the end of this decade. >>> >>> -- >>> Maxim Kuvyrkov >>> www.linaro.org >>> >>> >>> >>>> On May 14, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> >>>> wrote: >>>> >>>> This patch adds scripts to contrib/ to migrate full history of GCC's >>>> subversion repository to git. My hope is that these scripts will finally >>>> allow GCC project to migrate to Git. >>>> >>>> The result of the conversion is at >>>> https://github.com/maxim-kuvyrkov/gcc/branches/all . Branches with "@rev" >>>> suffixes represent branch points. The conversion is still running, so not >>>> all branches may appear right away. >>>> >>>> The scripts are not specific to GCC repo and are usable for other >>>> projects. In particular, they should be able to convert downstream GCC >>>> svn repos. >>>> >>>> The scripts convert svn history branch by branch. They rely on git-svn on >>>> convert individual branches. Git-svn is a good tool for converting >>>> individual branches. It is, however, either very slow at converting the >>>> entire GCC repo, or goes into infinite loop. >>>> >>>> There are 3 scripts: >>>> >>>> - svn-git-repo.sh: top level script to convert entire repo or a part of it >>>> (e.g., branches/), >>>> - svn-list-branches.sh: helper script to output branches and their parents >>>> in bottom-up order, >>>> - svn-git-branch.sh: helper script to convert a single branch. >>>> >>>> Whenever possible, svn-git-branch.sh uses existing git branches as caches. >>>> >>>> What are your questions and comments? >>>> >>>> The attached is cleaned up version, which hasn't been fully tested yet; >>>> typos and other silly mistakes are likely. OK to commit after testing? >>>> >>>> -- >>>> Maxim Kuvyrkov >>>> www.linaro.org >>>> >>>> >>>> <0001-Contrib-SVN-Git-conversion-scripts.patch>