Hi Émanuel, On Sun, May 8, 2016 at 4:40 PM, Émanuel Barry <vgr...@live.ca> wrote:
> Take each X commit (say, every 100th or 1000th commit, or even every > commit if we decide to be insane^Wprecise), store hashes of all files at > that revision with possibly the file tree, in a .py file as a list or dict, > or json or anything you prefer. Then I upload it for you to look at and you > can compare with the mercurial repo. Or we run the same script on the > mercurial repo and compare the resulting files. If we store anything externally, that could start limiting us. I looked at the problem in this angle - final cpython git repo has ~10000 commits in master branch. That's not a large number to deal with. The orginal hg repo should have exact number of commits. We have to do a diff between each of these commits, including merge commits. and check if contents of those commits are same, if we encounter anything where git-repo differs in content or history from hg-repo, we alert and fail. Since this is a history checking operation and we could complete this in O(minutes) or ~1 hour to validate the repos. This will give us confidence on the migration, and will help us evaluate multiple hg -> git repos that have been migrated at different points in time. This feature will go in this tool: https://github.com/orsenthil/cpython-hg-to-git , which we will use to migrate, sync, and validate hg->git repos. If interested, you could research for efficient way to do the above operation and submit a pull request against that tool. HTH, Senthil
_______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct