On Sat, Jul 25, 2020 at 12:27:42 +0200, Antonio Muci via Mercurial-devel wrote:
> That's sad.

Yeah.

This motivated me enough to clone the repos (hg and git) and collect some
data.  Maybe people here will find it useful.

First off, the clone itself.  I cloned it from the official upstream repos.
My internet connection is 150 Mbit/s, the storage is a 3-way ZFS mirror.  I
used hg 4.9.1 (py27), and git 2.21.0.  (I know, I need to update both.  This
is on a box that has a solid network connection but is harder to update.  If
there is interest I can spend the effort to update them and re-run it with
newer versions.)

$ hg clone https://hg.openjdk.java.net/jdk/jdk
destination directory: jdk
requesting all changes
adding changesets
adding manifests
adding file changes
added 60318 changesets with 516970 changes to 187542 files
new changesets fd16c54261b3:227cd01f15fa
updating to branch default
65415 files updated, 0 files merged, 0 files removed, 0 files unresolved

This took a total of ~16.3 mins (978 seconds), of which:

 1) ~30 seconds were used by "adding changesets"
 2) ~8 mins were used by "adding manifests"
 3) ~7 mins were used by "adding files"

The adding of manifests and files was receiving ~1.0-1.2 MB/s (bytes
received on the NIC, *not* actual payload inside TCP and hg specific
framing).

My box still had plenty of CPU, RAM, and I/O left so I don't know if the 1.0
MB/s was a result of hg being sub-optimal or if the hg server or the network
connection were the bottleneck.

To rule out internet slowness, I ran 'hg serve' on the clone and did a clone
on my laptop (5.5rc0+25-fbc53c5853b0, py3) on the same subnet (wifi
connected).  It took 495 seconds (2x faster), and I saw slightly higher
network utilization (~1.7 MB/s) and the laptop CPU pegged at 100% for pretty
much the entire duration of the "adding file changes" portion.  (The laptop
has an SSD, so that probably helped eliminate some of the slowness - it is a
bit of an apples and oranges comparison, but interesting none the less.)

Cloning directly from java.net on my laptop took 1400 seconds - so, about
50% slower.  This could be because of the wifi, py3 vs. py27, hg version
difference, etc., etc.


$ git clone https://github.com/openjdk/jdk.git jdk-git
Cloning into 'jdk-git'...
remote: Enumerating objects: 819, done.
remote: Counting objects: 100% (819/819), done.
remote: Compressing objects: 100% (577/577), done.
remote: Total 1072595 (delta 356), reused 423 (delta 199), pack-reused 1071776
Receiving objects: 100% (1072595/1072595), 414.42 MiB | 6.17 MiB/s, done.
Resolving deltas: 100% (800673/800673), done.
Checking out files: 100% (65415/65415), done.

This took a total of 1 min 49 secs (109 seconds), of which:

 1) 1 min 8 secs were used by "receiving objects"
 2) 25 seconds were used by "resolving deltas"

The receiving of objects was pulling in 6.8 MB/s.

Cloning directly on my laptop took 99 seconds with git version 2.26.2.

...
> About .hg size (1a): is it really true that .hg is 1.2GB and the 
> corresponding .git version is 300 MB? Verifying it should not be too 
> difficult. If it's true (I doubt it), something has to be done.

$ du -shA jdk-*/.{hg,git}
1.10G   jdk-hg/.hg
452M    jdk-git/.git

So, both numbers seem to be tweaked to justify migration - at least on a
fresh clone - but I'd say hg is worse by 2-3x.

The whole checkout in case anyone cares:

$ du -shA *
1014M   jdk-git
1.65G   jdk-hg

Now, hg specifics.  It looks like the manifest is huge.  This corresponds to
how long it took to download.

-rw-r--r--   1 jeffpc   jeffpc     25.2M Jul 25 12:16 00changelog.d
-rw-r--r--   1 jeffpc   jeffpc     3.68M Jul 25 12:01 00changelog.i
-rw-r--r--   1 jeffpc   jeffpc      434M Jul 25 12:09 00manifest.d
-rw-r--r--   1 jeffpc   jeffpc     3.67M Jul 25 12:09 00manifest.i

Not a complete surprised given that there are a lot of files (~65k) tracked
and many use the super-long file paths (e.g.,
test/hotspot/jtreg/runtime/exceptionMsgs/AbstractMethodError/AbstractMethodErrorTest.java).
That adds up.  Just the paths in the manifest itself add up to almost 4.7MB.

$ hg manifest | wc
   65415   65415 4694467

I'm guessing that they would have benefited from treemanifest.


I also tried to clone locally to see what sort of thing a user would see.

$ hg clone jdk-hg test
$ git clone jdk-git test-git

hg took 60 seconds (with hot cache, ~120 secs cold cache), git took 13
seconds.  Git hardlinked the one big pack file, while hg hardlinked each of
the file in .hg/store.  Obviosly, hardlinking 2 files is much faster than
hardlinking ~180k.  (treemanifest would have made this even worse for hg.)


I just kicked off a conversion to treemanifest.  It'll take a while.

Jeff.

-- 
Intellectuals solve problems; geniuses prevent them
                - Albert Einstein
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to