On Fri, Nov 2, 2012 at 5:41 PM, Felipe Contreras
<felipe.contre...@gmail.com> wrote:
> On Fri, Nov 2, 2012 at 3:48 PM, Jeff King <p...@peff.net> wrote:
>> On Thu, Nov 01, 2012 at 05:08:52AM +0100, Felipe Contreras wrote:
>>> > Turns out msysgit's remote-hg is not exporting the whole repository,
>>> > that's why it's faster =/
>>> It seems the reason is that it would only export to the point where
>>> the branch is checked out. After updating the to the tip I noticed
>>> there was a performance difference.
>>> I investigated and found two reasons:
>>> 1) msysgit's version doesn't export files twice, I've now implemented the 
>>> same
>>> 2) msysgit's version uses a very simple algorithm to find out file changes
>>> This second point causes msysgit to miss some file changes. Using the
>>> same algorithm I get the same performance, but the output is not
>>> correct.
>> Do you have a test case that demonstrates this? It would be helpful for
>> reviewers, but also helpful to msysgit people if they want to fix their
>> implementation.
> Cloning the mercurial repo:
> % hg log --stat -r 131
> changeset:   131:c9d51742471c
> parent:      127:44538462d3c8
> user:        j...@edge2.net
> date:        Sat May 21 11:35:26 2005 -0700
> summary:     moving hgweb to mercurial subdir
>  hgweb.py           |  377
> ------------------------------------------------------------------------------------------
>  mercurial/hgweb.py |  377
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 377 insertions(+), 377 deletions(-)
> % git show --stat 1f9bcfe7cc3d7af7b4533895181acd316ce172d8
> commit 1f9bcfe7cc3d7af7b4533895181acd316ce172d8
> Author: j...@edge2.net <none@none>
> Date:   Sat May 21 11:35:26 2005 -0700
>     moving hgweb to mercurial subdir
>  mercurial/hgweb.py | 377
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 377 insertions(+)

I talked with some people in #mercurial, and apparently there is a
concept of a 'changelog' that is supposed to store these changes, but
since the format has changed, the content of it is unreliable. That's
not a big problem because it's used mostly for reporting purposes
(log, query), not for doing anything reliable.

To reliably see the changes, one has to compare the 'manifest' of the
revisions involved, which contain *all* the files in them.

That's what I was doing already, but I found a more efficient way to
do it. msysGit is using the changelog, which is quite fast, but not

Unfortunately while going trough mercurial's code, I found an issue,
and it turns out that 1) is not correct.

In mercurial, a file hash contains also the parent file nodes, which
means that even if two files have the same content, they would not
have the same hash, so there's no point in keeping track of them to
avoid extracting the data unnecessarily, because in order to make sure
they are different, you need to extract the data anyway, defeating the

Which means mercurial doesn't really behave as one would expect:

# add files with the same content

 $ echo a > a
  $ hg ci -Am adda
  adding a
  $ echo a >> a
  $ hg ci -m changea
  $ echo a > a
  $ hg st --rev 0
  $ hg ci -m reverta
  $ hg log -G --template '{rev} {desc}\n'
  @  2 reverta
  o  1 changea
  o  0 adda

# check the difference between the first and the last revision

  $ hg st --rev 0:2
  M a
  $ hg cat -r 0 a
  $ hg cat -r 2 a

I will be checking again from where did I get the performance
improvements, but most likely it's from my implementation of
mercurial's repo.status().


Felipe Contreras
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to