On 12 February 2016 at 03:07, Brett Cannon <br...@python.org> wrote: > On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez <nicolas.alva...@gmail.com> > wrote: >> I tried fast-export, and I don't really see anything wrong with the >> repository. The size is 221MB.
One thing I’m slightly curious about is how much the result differs from <https://github.com/python/cpython> or other results, and if so, what the differences are. The differences could be serious (mangled history), or they could be trivial things like stripping trailing newlines from commit messages, or skipping commits that don’t change any files. >> It depends on how crazy you want to go. For example, SVN-era merges >> don't appear as merges, but looks like some SVN-era branches don't >> exist in Hg to begin with (Would I need to get cpython-fullhistory? >> Cloning it gives me a 400 Bad Request). Do we care about that? > > Good question. If you are not an even clone it then that shows how much > people who are. Honestly I wouldn't worry since we have the history in the > hg repo (converting from svn was necessary to have it available without the > server). I care a bit. If I get the time, I would like to figure out a robust way to convert the Subversion history to Git so that the svnmerge information is included as proper merges. Another concern for me is that some of the useful history is not even in Mercurial. For example <https://hg.python.org/lookup/r70152> is an svnmerge from ^/python/branches/io-c into ^/python/branches/py3k, but the Mercurial repository doesn’t have the branch history, so all the merged-in Subversion revisions such as r68683 are missing. Some other highlights on my quest to investigate the holy Subversion respository (I can post my full notes somewhere if ppl are interested): * It is nice to have a local mirror of the Subversion repository so that experimenting with different options and programs isn’t horribly slow. But I don’t want to mirror everything or overload the server because there are other projects stored in the repository that seem to take up a lot of space (and download time). * What is the story with the cpython-fullhistory Mercurial repository? On the surface it almost looks like an out-of-date copy of the main repository, but I notice some subtle differences, e.g. revision ids for early tags are different, v1.0.0 tag is added. * Some Subversion revisions actually merge stuff from outside the Python tree (e.g. <https://hg.python.org/lookup/r88662> from ^/sandbox/trunk/2to3/lib2to3 into ^/branches/release27-maint/Lib/lib2to3. Not sure if it is worth trying to salvage these merges; I never noticed them when working on Python. >> Or, changes that come from non-committers could have their Author >> field modified, maybe based on the ACKS file modification. It's >> feasible but will take time and manual work. Do we care about that? > > That would be great but too much effort. I think it would not be worth it, and could even be detrimenal. You would be trying to guess based on incomplete and unreliable information. Maybe one person wrote a test, another wrote the implementation, and a third wrote the documentation, but it was all committed at once. Maybe the author was already in ACKS and the committer did not mention who the author was in the message. I think it is safer to not pretend the author field is alway accurate. _______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct