On Thu, Sep 12, 2019 at 9:53 AM Joseph Mirabel <[email protected]> wrote:
> To summarize what is missing: > > - there are a lot of "backporting rev[0-9]+" that are not found. I don't > know what "backporting" means. I just know that "hg log --rev N" for all > the one I tested returns "unknown revision". > $ hg log -v | grep "backporting" does not give me that many: backporting 1784: fixed bug #62 -> For this one "hg log -r 1784" tells me that the true hash is 0430786c2a6f (but we don't care if we lost one link!) The followings are very old subversion references, we don't care at all: backporting 964177 (gcc 3.3 fix) backporting r964165 (gcc 3.3 fixes) backporting rev 951682 (compilation fix in aligned allocator) backporting rev 918446: fix MSVC internal compilation error backporting rev925153 (bugfix in MapBase::coeffRef(int) ) backporting commit 918468 (fix MSVC internal error) > - I do not catch ranges of revision number like "[0-9]+-[0-9]+". It > wouldn't be hard to achieve. > -> I could not find any meaningful one. I only found two meaningful "([0-9]+:)([0-9]+)" occurrences like: 6089:76b6c62565a6. In this case, if \2 is a valid hash, then \1 (e.g., "6089:") should be dropped to enable auto-link creation, see: https://github.com/jmirabel/eigen_tmp/commit/4325f05a4b60 But that's really no big deal as the true hash has been properly updated, and there are only 2 occurrences! > - what about unamed hg heads ? Should we drop them ? If not, I would > appreciate if someone knowing mercurial could name them (with a name valid > for hg and git). > I guess this issue is related to changeset 59a7e404a93c, which is an empty commit closing a branch. If your script ignores it without additional issue, then that's fine. So for me this is all good, I would only care about updating references to bugs/PR by applying the following substitutions: "([Bb]ug) (\d+)" -> "\2 #\1" "bugs (\d+) and (\d+)" -> "bugs #\1 and #\2" "http://eigen.tuxfamily.org/bz/show_bug.cgi\?id=(\d+)" -> "#\1" "[Pp]ull [Rr]equest #(\d+)" -> "pull request PR-\1" "PR #(\d+)" -> "PR-\1" gael > > Joseph > > > Le 12/09/2019 à 00:37, Gael Guennebaud a écrit : > > > > On Wed, Sep 11, 2019 at 7:38 PM Joseph Mirabel <[email protected]> > wrote: > >> Dear Eigen developers, >> >> - I can convert all reference like to revisions or mercurial hashes that >> follows the regex in [2]. >> > > I'll look at it more carefully later, but... wow!! this looks very > promising: https://github.com/jmirabel/eigen_tmp/commit/6e53e31dc2d79da > > For comparison, the same commit in our official git-mirror: > https://github.com/eigenteam/eigen-git-mirror/commit/6ba9310bc2c168 > > gael > > >> - I did not try to convert URLs although it should not be hard. >> >> - I manually edited the author file [5] so that they would fit git >> author format. If you find yourself in the list and want to update it, >> you can contact me. >> >> >> It should not be hard to add more rules to the plugin convert_references >> if anyone feels like doing it. >> >> >> Best, >> >> Joseph >> >> >> [1] https://github.com/frej/fast-export.git >> >> [2] >> >> https://github.com/jmirabel/fast-export/blob/1fdc76e0626acd6adfcc0d900d14f36b459c4798/plugins/convert_references/__init__.py#L16 >> >> [3] https://github.com/jmirabel/fast-export >> >> [4] https://github.com/jmirabel/eigen_tmp.git >> >> [5] >> https://github.com/jmirabel/fast-export/blob/master/eigen/authors_reworked >> >> >> >> Le 11/09/2019 à 18:03, Gael Guennebaud a écrit : >> > To prepare the migration from bitbucket, I started to play a bit with >> > its API to see what could be done. So far I've quickly draft two >> > (ugly) python scripts to archive the forks and pull-requests. Since >> > this is a one shot for us, I did not cared about robustness, safety, >> > generality, beauty, etc. >> > >> > You can see them there >> > : https://gitlab.com/ggael/bitbucket-migration-tools and contribute! >> > >> > ** Forks ** >> > >> > You can see the summary of the fork script >> > there: http://manao.inria.fr/eigen_tmp/archive_forks_log.html >> > >> > The hg clones (history+checkout) represents 20GB, maybe 12GB if we >> > remove the checkouts. Among the 460 forks, 214 seems to have no change >> > at all (according to "hg out") and could be dropped. I don't know yet >> > where to host them though. >> > >> > This script can be ran incrementally. >> > >> > >> > ** Pull-Requests ** >> > >> > You can find the output of the pull-requests script >> > there: http://manao.inria.fr/eigen_tmp/pullrequests/ >> > >> > There is a short summary, and then for each PR a static .html file >> > plus diff/patch files, and other details. For instance, >> > see: http://manao.inria.fr/eigen_tmp/pullrequests/OPEN/686/pr686.html >> > >> > Currently this script cannot be ran incrementally. You have to run it >> > just before closing the respective repository! >> > >> > Also, this script does not grab inline comments. Only the main >> > discussions is archived. Those can be obtained by iterating over the >> > "activity" pages, but I don't think that's worth the effort because >> > they would be difficult to exploit anyway. >> > >> > >> > ** hg to git ** >> > >> > As discussed in the other thread, if we switch from hg to git, then >> > all hashes will have to be updated. Generating a map file is easy, and >> > thus updating the links/hashes in bug comments and PR comments should >> > not be too difficult (we only have to figure out the right regex to >> > catch all variants). >> > >> > However, updating the hashes within the commit messages will require >> > to rewrite the whole history in a careful order. Does anyone here >> > feels brave enough to write such a script? If not, I guess we could >> > live with an online php script doing the hash conversion on demand. I >> > don't think we'll have to follow such hashes so frequently. >> > >> > cheers, >> > gael >> > >> > >> >> >> >>
