On 01.04.2014, at 15:15, Jeff King <p...@peff.net> wrote:

> On Tue, Apr 01, 2014 at 10:07:03PM +0900, Mike Hommey wrote:
>>> For my own curiosity, how does this differ from what is in
>>> contrib/remote-helpers/git-remote-hg?
>> contrib/remote-helpers/git-remote-hg does a local mercurial clone before
>> doing the git conversion. While this is not really a problem for most
>> mercurial projects, it tends to be slow with big ones, like the firefox
>> source code. What I'm aiming at is something that can talk directly to a
>> remote mercurial server.
> Ah, that makes sense. Thanks for explaining.

Hm, myself, I am not quite convinced. Yes, there is an overhead, but it is 
one-time (well, the space overhead is not, but Mike only mentioned time, not 
space). I wonder if it is really worth the effort to start yet another project 
on this... Moreover, I don't see a fundamental reason why one could not modify 
git-remote-hg to work this way. At least optionally - myself, I would strongly 
prefer the current way, as translating between git and hg 100% round trip clean 
is provably impossible [1].

Thing is, there are by now more than half a dozen projects of this kind. In my 
impression, all do the low hanging fruit, some go slightly beyond that, but 
*none* solves all the tough parts and itty-gritty details...

Just to mention a few of the problems that are usually ignored, even though 
they have real world impact:

- the concept of Mercurial branches has no counterpart in git, making all kinds 
of translations hard. As a consequence, many translators ignore hg branches 
completely (e.g. hg-git -- at least it used to do that, not sure whether that 
changed) or handle them only partially (e.g. 
contrib/remote-helpers/git-remote-hg: It does not deal with multiple heads or 
with closed branches)
(this can cause sever issues with git-remote-hg, by the way, with dangling 
refs, which, when pruned by an auto-gc, can wipe your fast-import marks file, 
causing major pain...)

- in the other direction, git branches most closely correspond to hg bookmarks. 
But what if a hg repository has both a branch "foo" and a bookmark "foo"? 
git-remote-hg partially deals with that (by mapping the hg bookmark "foo" to 
the git branch "foo", and mapping the hg branch "foo" to the git branch 
"branches/foo"), but this still has issues (besides being annoying for users, 
it clearly still not avoids ref name conflicts)

- git and hg also allow different characters sets in branch and bookmark names

- in hg you can simultaneously have things called "foo" and "foo/bar". In git, 
you can't.

There is plenty more. Of course, some of this might just be impossible [1] to 
handle nicely. But I find it kind of sad that everybody seems to prefer to 
start yet another solution, then leave it as 80%, instead of trying to improve 
upon existing work :-(.

By the way, to get back to the speed bottleneck: We found that by far the 
slowest part in importing large repositories like the Mozilla one was not the 
initial cloning of the hg repository (althoug that could sometimes take ages) 
but rather an unfortunate mismatch between the hg and git storage approach. 
When creating a fast-import stream, the normal way to go about that is to 
import things commit by commit. But if you do that, then extracting file data 
from Mercurial and its revlog data format easily can degenerate into the worst 
case quadratic runtime :-/. Now, if one know that one is going to import the 
whole repository anyway, one could do better by first exporting all file 
revisions, generating many blobs and their marks, and keeping these in memory, 
*then* exporting the commits, reverting to these blob marks. 

However, this stops being a great idea once you are working in incremental 
mode. That said, it certainly would make sense to investigate this possibility 
(regardless of whether one uses a local hg clone or directly talks to the 
remote repository); at least in theory, even if one only uses this approach 
during the initial import, it should be a strict improvement over the current 

In closing, I should mention that the problems caused by translating between hg 
and git concepts are by far not the only ones; the fast-import interface itself 
still has limitations that make some things annoying. E.g. when a remote is 
renamed, the remote handler does not know that, which can lead to awkward 
situations that right now may require some trickery to resolve correctly, if it 
is possible at all. Or if a user manually removed a commit that a remote-helper 
previously referenced in a marks file, and that remote helper than uses that 
marks file, fast-import just dies, complaining about the invalid mark. As a 
result, every proper remote helper basically would need to fully parse and 
verify those marks files, detect "broken" marks, and deal with that -- there is 
no way to benefit from the existing mark verification code in fast-import right 

Please don't get me wrong. I don't want to whine, and I hope I can contribute 
to solving some of these issues at some point (though lack of time is a nasty 
issue). In the meantime, I'd love if other people were interested in improving 
one of the existing solutions to the problem (such as git-remote-hg, gitifyhg 
or hg-git), instead of creating yet another half-way solution... :-)


[1] That is, unless you are willing to use a custom server, such as Kiln 
Harmony <http://blog.fogcreek.com/kiln-harmony-internals-the-basics/>. But that 
is cheating, as this is not a real round-trip conversion; rather, you keep a 
git and a hg repository in perfect sync all the time and present them as a 
single entity to the outside world.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to