Hi Émanuel,

On Sun, May 8, 2016 at 4:40 PM, Émanuel Barry <vgr...@live.ca> wrote:

> Take each X commit (say, every 100th or 1000th commit, or even every
> commit if we decide to be insane^Wprecise), store hashes of all files at
> that revision with possibly the file tree, in a .py file as a list or dict,
> or json or anything you prefer. Then I upload it for you to look at and you
> can compare with the mercurial repo. Or we run the same script on the
> mercurial repo and compare the resulting files.


If we store anything externally, that could start limiting us.

I looked at the problem in this angle - final cpython git repo has ~10000
commits in master branch. That's not a large number to deal with. The
orginal hg repo should have exact number of commits. We have to do a diff
between each of these commits, including merge commits. and check if
contents of those commits are same, if we encounter anything where git-repo
differs in content or history from hg-repo, we alert and fail.

Since this is a history checking operation and we could complete this in
O(minutes) or ~1 hour to validate the repos. This will give us confidence
on the migration, and will help us evaluate multiple hg -> git repos that
have been migrated at different points in time.

This feature will go in this tool:
https://github.com/orsenthil/cpython-hg-to-git , which we will use to
migrate, sync, and validate hg->git repos.
If interested, you could research for efficient way to do the above
operation and submit a pull request against that tool.

HTH,
Senthil
_______________________________________________
core-workflow mailing list
core-workflow@python.org
https://mail.python.org/mailman/listinfo/core-workflow
This list is governed by the PSF Code of Conduct: 
https://www.python.org/psf/codeofconduct

Reply via email to