On Mon, Aug 26, 2019 at 6:05 AM David Tellenbach <
[email protected]> wrote:

> I somewhat doubt that any existing hg->git converters automatically
> translates these hashes, but I'd be very happy if someone finds out
> otherwise. Changing these manually is definitely not an option.
>
> I might have good news on this one: We are apparently not the only project
> that works on migrating from Mercurial to Git. The OpenJDK project (a free
> implementation of the Java platform) has created Skara, a set of tools to
> handle all kind of stuff related to contributing to OpenJDK (
> https://github.com/openjdk/skara). Some of the tools could be really
> helpful for our issues (see https://openjdk.java.net/jeps/357).
>
> The relevant tool seem to be git-openjdk-import which is used to import
> from Mercurial to Git. I just had a short glance on the code but it seems
> to be very generic and does not seem to contain OpenJDP related stuff at
> all. The interesting part is the follow paragraph from
> https://openjdk.java.net/jeps/357
>
> We've also prototyped new tool, git-translate. This tool uses a file
> called.hgcommits that is generated by the conversion tools and committed
> to the Git repositories. This file contains a sequence of lines, each of
> which contains two hexadecimal hashes: the first is the hash of a Mercurial
> changeset and the second is the hash of the Git commit resulting from
> converting that Mercurial changeset. The tool git-translate simply
> queries the file .hgcommits
>
> I've been pondering how to implement something similar as a custom tool.
If you can get a mapping of the hashes, then writing something custom
around git filter-branch <https://git-scm.com/docs/git-filter-branch> should
be straightforwards. Updating the mapping as commits are rewritten takes a
bit of thought, but I don't think it's hard. Using somebody else's tool
might be easier though.

>
> However, even if we have a translate tool this is still complicated:
> Changing hashes or links in a commit again alters the git hash and the
> translation is wrong for this particular commit. This could be a problem if
> a commit is referenced by more than one other commit or if commit a
> references commit b references commit c.
>

Traversing the commit graph in a topological order and rewriting hashes
based on the mapping (updated by past rewrites) seems like it should be
fine to me.

I don't see how a commit can refer to a hash of a commit that descends from
it, for basically the same reason (putting the hash into commit A changes
the hash of its child commit B, so A can't refer to B by hash). I know
that's true for git, but I'm not familiar with hg so I might be missing
something about how hashes work there though.

>
> On 24/08/2019 12.30, David Tellenbach wrote:
>
> Also, if we stayed with mercurial, but used a different provider, we can't
> modify the history, because that would influence all the hashes (but then
> only the 9 direct links to "bitbucket.org/..." you found would be broken,
> which is acceptable, IMO)
>
> Of course we can just ignore these links (though I think broken
> links/hashes are even worse than non-existing ones ...)
>
> Another point are links inside the codebase that point to bitbucket.
> Following the same logic as above I use
> hg grep "bitbucket.org"
> and get 11 links (all seem to be the same). Again something fixable
> manually.
>
>
> Agreed, this part is easy to fix manually.
>
> git filter-branch <https://git-scm.com/docs/git-filter-branch> could also
fix them all throughout the entire history (just run sed on all the files
to rewrite the links). Not sure if rewriting the history is desirable, but
it would definitely be easy after they're all in git.

I've used git-remote-hg to import Eigen with git-subtree before, and it
worked fine. Looking now, there are more alternatives than I found 4 years
ago, including forks of that project, so there are more choices to make. It
does have support for putting the Mercurial revisions in Git commit notes,
which addresses some of the concerns around recording the mapping.

My two cents about the larger question in this thread: I find git much more
familiar to work with as an occasional contributor and debugger. Getting
from a diff to a pull request with a VCS I don't use regularly is
nontrivial, and Eigen is the only place I've interacted with hg. Being
unfamiliar with the VCS is an ever bigger barrier to understanding the
history of a project than changing it. I find myself doing that a lot more
often than actually contributing. Trying to understand what's been
cherry-picked ("grafted from" for hg I think?) into various branches to
verify whether fixes for bugs introduced in other commits has been
particularly problematic for me with Eigen.

Reply via email to